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Abstract 

Quantile  regression  (QR)  fits  a  linear  model  for  conditional  quantiles,  just  as  ordinary  least 
squares  (OLS)  fits  a  linear  model  for  conditional  means.  An  attractive  feature  of  OLS  is  that 
it  gives  the  minimum  mean  square  error  linear  approximation  to  the  conditional  expectation 
function  even  when  the  linear  model  is  misspecified.  Empirical  research  using  quantile  regression 
with  discrete  covariates  suggests  that  QR  may  have  a  similar  property,  but  the  exact  nature 
of  the  linear  approximation  has  remained  elusive.  In  this  paper,  we  show  that  QR  can  be 
interpreted  as  minimizing  a  weighted  mean-squared  error  loss  function  for  specification  error. 
The  weighting  function  is  an  average  density  of  the  dependent  variable  near  the  true  conditional 
quantile.  The  weighted  least  squares  interpretation  of  QR  is  used  to  derive  an  omitted  variables 
bias  formula  and  a  partial  quantile  correlation  concept,  similar  to  the  relationship  between  partial 
correlation  and  OLS.  We  also  derive  general  asymptotic  results  for  QR  processes  allowing  for 
misspecification  of  the  conditional  quantile  function,  extending  earlier  results  from  a  single 
quantile  to  the  entire  process.  The  approximation  properties  of  QR  are  illustrated  through  an 
analysis  of  the  wage  structure  and  residual  inequality  in  US  census  data  for  1980,  1990,  and 
2000.  The  results  suggest  continued  residual  inequality  growth  in  the  1990s,  primarily  in  the 
upper  half  of  the  wage  distribution  and  for  college  graduates. 
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1      Introduction 

The  Quantile  Regression  (QR)  estimator,  introduced  by  Koenker  and  Bassett  (1978),  is  an 
increasingly  important  empirical  tool,  allowing  researchers  to  fit  parsimonious  models  to  an 
entire  conditional  distribution.  Part  of  the  appeal  of  quantile  regression  derives  from  a  natural 
parallel  with  conventional  ordinary  least  squares  (OLS)  or  mean  regression.  Just  as  OLS 
regression  coefficients  offer  convenient  summary  statistics  for  conditional  expectation  functions 
(CEF),  quantile  regression  coefficients  can  be  used  to  make  easily  interpreted  statements  about 
conditional  distributions.  Moreover,  unlike  OLS  coefficients,  QR  estimates  capture  changes  in 
distribution  shape  and  spread,  as  well  as  changes  in  location. 

An  especially  attractive  feature  of  OLS  regression  estimates  is  their  robustness  and  inter- 
pretability  under  misspecification  of  the  CEF.  In  addition  to  consistently  estimating  a  linear 
CEF,  OLS  estimates  provide  the  minimum  mean  square  error  (MMSE)  linear  approximation  to 
a  CEF  of  any  shape.  The  MMSE  interpretation  of  OLS  is  emphasized  by  Chamberlain  (1984) 
and  Goldberger  (1991),  while  an  average  derivative  interpretation  of  OLS  features  in  Angrist 
and  Krueger  (1999).  This  robustness  property  -  i.e.,  the  fact  that  OLS  provides  a  meaningful 
and  well-understood  summary  statistic  for  multivariate  conditional  expectations  under  almost 
all  circumstances  -  undoubtedly  contributes  to  the  primacy  of  OLS  regression  as  an  empirical 
tool.  In  view  of  the  possibility  of  interpretation  under  misspecification,  modern  theoretical 
research  on  regression  inference  typically  also  allows  for  misspecification  when  deriving  limiting 
distributions  (see,  e.g.,  White,  1980). 

While  QR  estimates  are  as  easy  to  compute  as  OLS  regression  coefficients,  an  important 
difference  between  OLS  and  QR  is  that  most  of  the  theoretical  and  applied  work  on  QR  postu- 
lates a  true  linear  model  for  conditional  quantiles.  This  raises  the  question  of  whether  and  how 
QR  estimates  can  be  interpreted  when  the  linear  model  for  conditional  quantiles  is  misspecified 
(for  example,  QR  estimates  at  different  quantiles  may  imply  conditional  quantile  functions  that 
cross).  One  interpretation  for  QR  under  misspecification  is  that  it  provides  the  best  linear  pre- 
dictor for  a  response  variable  under  asymmetric  loss.  This  interpretation  is  not  very  satisfying, 
however,  since  prediction  under  asymmetric  loss  is  typically  not  the  object  of  interest  in  em- 
pirical work  (see,  e.g.,  Koenker  and  Hallock,  2001). l  Empirical  research  on  quantile  regression 
with  discrete  covariates  suggests  that  QR  may  have  an  approximation  property  similar  to  that 
of  OLS,  but  the  exact  nature  of  the  linear  approximation  has  remained  an  important  unresolved 


1An  exception  is  the  forecasting  literature;  see,  e.g.,  Giacomini  and  Komunjer  (2003). 


question  (cf.  Chamberlain,  1994,  p.  181). 

The  first  contribution  of  this  paper  is  to  show  that  QR  can  be  interpreted  as  the  best  linear 
predictor  (BLP)  for  the  conditional  quantile  function  (CQF)  using  a  weighted  mean-squared 
error  loss  function,  much  as  OLS  regression  provides  a  MMSE  fit  to  the  CEF.  The  implied 
QR  weighting  function  can  be  used  to  understand  which,  if  any,  parts  of  the  distribution  of 
regressors  contribute  disproportionately  to  a  particular  set  of  QR  estimates.  We  also  show  how 
the  weighted  mean-square  error  interpretation  can  be  used  to  interpret  QR  coefficients  as  partial 
quantile  correlation  coefficients  and  to  develop  an  omitted  variable  bias  formulae  for  QR. 

A  second  contribution  is  to  develop  a  distribution  theory  for  the  entire  QR  process  that 
applies  under  misspecification  of  the  conditional  quantile  function.  The  approach  developed 
here  has  two  advantages  over  current  practice.  First,  we  do  not  assume  that  the  true  quantile 
function  is  linear.  Second,  some  of  the  regularity  conditions  that  would  be  required  for  a  fully 
nonparametric  approach,  such  as  multiple  differentiability  of  the  quantile  function  in  regressors 
and  continuity  of  regressors,  are  not  needed.  Our  analysis  of  the  QR  process  extends  the 
results  of  Chamberlain  (1994)  and  Halm  (1997),  who  derived  the  basic  variance  formula  for  a 
particular  quantile  under  misspecification.  See  also  Koenker  and  Machado  (1999),  Gutenbrunner 
and  Jureckova  (1992),  and  Gutenbrunner,  Jureckova,  Koenker,  Portnoy  (1993),  who  develop 
inference  procedures  based  on  QR  processes  for  the  linear  location  shift  model  and  linear  Pitman 
deviations  from  this  model. 

An  important  consequence  of  our  analysis  is  that  the  currently  used  inference  tools  on  the 
QR  process,  such  as  those  in  Koenker  and  Machado  (1999),  are  not  robust  to  misspecification. 
This  is  because  the  limit  distribution  of  the  QR  process  is  not  distribution-free  under  misspeci- 
fication. Moreover,  Khmaladzation  techniques,  as  in  Bai  (1998)  and  Koenker  and  Xiao  (2002), 
cannot  restore  the  distribution-free  nature  of  the  limit  theory  in  this  case.  We  therefore  suggest 
alternative  methods  that  provide  valid  inference  for  the  QR  process  under  misspecification. 

The  approximation  theorems  and  other  theoretical  ideas  in  the  paper  are  illustrated  with  an 
analysis  of  wage  data  from  the  1980,  1990,  and  2000  U.S.  censuses.  The  analysis  here  is  motivated 
by  similar  studies  in  labor  economics,  where  quantile  regression  has  been  widely  used  to  model 
changes  in  the  wage  distribution  (see,  e.g.,  Buchinsky,  1994  and  Autor,  Katz,  and  Kearney, 
2004  for  the  US;  Gosling,  Machin,  and  Meghir,  2000,  for  the  UK;  Abadie,  1997,  for  Spain,  and 
Machado  and  Mata,  2003,  for  Portugal).  In  particular,  we  show  that  quantile  regression,  while 
an  inexact  model  for  conditional  quantiles,  gives  a  good  account  of  the  relevant  stylized  facts. 
An  appealing  feature  of  quantile  regression  in  this  context  is  that  quantile  regression  coefficients 


can  be  used  directly  to  describe  "residual  inequality,"  i.e.  the  spread  in  the  \va>',i>  distribution 
conditional  on  the  variables  included  in  the  quantile  regression  model.  Attempts  to  model 
residual  wage  inequality  have  been  of  major  substantive  importance  to  labor  economists  since 
Juhn,  Murphy,  and  Pierce  (1993). 

The  paper  is  organized  as  follows.  Section  2  introduces  assumptions  and  notation  and 
presents  the  main  approximation  theorems,  followed  by  an  empirical  illustration.  Section  3 
provides  the  inference  theory  for  QR  processes  under  misspecification.  Section  4  presents  addi- 
tional empirical  results  on  the  evolution  of  residual  inequality  using  data  from  the  1980,  1990, 
and  2000  censuses.  Section  5  concludes  with  a  brief  summary. 

2     Interpreting  QR  Under  Misspecification 
2.1     Notation  and  Framework 

Given  a  continuos  response  variable  Y  and  a  d  x  1  regressor  vector  X,  we  are  interested  in 
the  (population)  conditional  quantile  function  (CQF)  of  Y  given  X.  The  conditional  quantile 
function  is  defined  as: 

QT(Y\X)  =  mi{y:FY(y\X)>r},  (1) 

where  Fy(y\X)  is  the  distribution  function  for  Y  conditional  on  X,  with  associated  conditional 
density  fy(y\X).  The  CQF  can  also  be  defined  as  the  solution  to  the  following  minimization 
problem  (assuming  integrability  throughout  where  needed): 

QT(Y\X)  =  arg  min  E  [pT(Y  -  q(X))}  ,  (2) 

i(x) 

where  pT(u)  =  (r  —  \{u  <  0))u  and  the  minimum  is  over  the  set  of  measurable  functions  of  X. 
This  is  a  potentially  infinite-dimensional  problem  if  covariates  are  continuous,  and  can  be  very 
high-dimensional  even  with  discrete  X.  It  may  nevertheless  be  possible  to  capture  important 
features  of  the  CQF  using  a  linear  model.    This  motivates  linear  quantile  regression. 

The  Koenker  and  Bassett  (1978)  linear  quantile  regression  (QR)  estimator  solves  the  follow- 
ing population  minimization  problem: 

P(t)  =  arg  min   E  [pT(Y  -  X'0)]  .  (3) 


If  q(X)  is  in  fact  linear,  the  QR  minimand  will  find  it  (just  as  if  the  CEF  is  linear,  OLS  regression 
will  find  it).  More  generally,  QR  provides  the  best  linear  predictor  for  Y  under  the  asymmetric 


loss  function,  pT.  As  noted  in  the  introduction,  however,  prediction  under  asymmetric  loss 
is  rarely  the  object  of  empirical  work.  Rather,  the  conditional  quantile  function  is  of  intrinsic 
interest.  For  example,  labor  economists  are  often  interested  in  comparisons  of  conditional  deciles 
as  a  measure  of  how  the  spread  of  a  wage  distribution  changes  conditional  on  covariates,  as  in 
Katz  and  Murphy  (1992)  and  Juhn,  Murphy,  and  Pierce  (1993).  Thus,  we  would  like  to  establish 
the  nature  of  approximation  that  QR  provides. 

2.2      The  QR  Approximation  Property 

Our  principal  theoretical  result  is  that  the  population  QR  vector  minimizes  a  weighted  sum 
of  squared  specification  errors.  This  is  easiest  to  show  using  notation  for  a  quantile-specific 
specification  error  and  for  a  quantile-specific  residual.  For  a  given  quantile  r,  we  define  the  QR 
specification  error  as: 

Ar(X,P)=X'P-QT(Y\X).  (4) 

Similarly,  let  er  be  a  quantile-specific  residual,  defined  as  the  deviation  of  the  response  variable 
from  the  conditional  quantile  of  interest: 

eT  =  Y-QT(Y\X),  (5) 

with  conditional  density  ftT  (e\X)  at  eT  —  e.  The  following  theorem  shows  that  QR  is  the 
weighted  least  squares  approximation  to  the  unknown  CQF. 

Theorem  1  (Approximation  Property)  Suppose  that  (i)  the  conditional  density  fy(y\X) 
exists  a.s.,  (ii)  QT(Y\X)  is  uniquely  defined  by  (2),  and  (Hi)  /3(r)  is  uniquely  defined  by  (3). 
Then 

/?(r)  =  arg    mm   E  [wT(X,  (3)  ■  A2T(X,  f3)}  ,  (PI) 

0€Rd 

where 

wT(X,p)    =     f  (l-u)feT(uAT(X,P)\X)  du  (6) 

Jo 

=      I  (1  -  u)  ■  fy  (u  ■  X'P  +  (1  -  u)  ■  QT{Y\X)\X)  du  >  0.  (7) 

Jo 

This  result  says  that  the  population  QR  coefficient  vector  /3(r)  minimizes  the  expected 

weighted  mean  squared  approximation  error,  i.e.  the  square  of  the  difference  between  the  true 

CQF  and  the  linear  approximation,  with  weighting  function  wT{X,@).    The  weights  involve  an 

integral  in  either  the  conditional  density  of  the  quantile  residual,  or,  by  a  change  of  variables  using 


Y  =  QT(Y\X)+  eT,  the  conditional  density  of  the  response  variable.  The  latter  representation 
shows  the  weighting  function  to  be  given  by  the  average  density  of  the  response  variable  over 
a  line  from  the  point  of  approximation,  X'(3,  to  the  true  conditional  quantile,  QT(Y\X).  Pre- 
multiplication  by  the  term  (1  —  u)  in  the  integral  results  in  more  weight  being  applied  at  points 
on  the  line  closer  to  the  true  CQF. 

We  refer  to  the  function  wT(X,(3)  as  defining  importance  weights,  since  this  function  de- 
termines the  importance  the  QR  minimand  gives  to  points  in  the  support  of  X  for  a  given 
distribution  of  X .  In  addition  to  the  importance  weights,  the  probability  distribution  of  X  also 
determines  the  ultimate  weight  given  to  different  values  of  X  in  the  least  squares  problem.  To 
see  this,  note  that  we  can  also  write  the  QR  minimand  as 

/?(r)  =  arg   min   fwT(x,(3)  ■  A2T(x,0)  dP(x),  (8) 

peud  J 

where  P(x)  is  the  CDF  of  X  (with  associated  probability  or  density  function  p(x)).  Thus,  the 
overall  weight  varies  in  the  distribution  of  X  according  to 

wT(x,P)-p(x).  (9) 

The  sense  in  which  QR  approximates  a  nonlinear  CQF  can  be  seen  for  an  empirical  wage 
equation  in  Figure  1.  This  figure  plots  an  estimate  of  the  CQF  for  log-earnings  given  education 
for  the  0.10,  0.25,  0.50,  0.75  and  0.90  quantiles,  using  data  for  US-born  black  and  white  men 
aged  40-49  from  the  1980  census  (see  Appendix  for  details  concerning  data).  Here  we  take 
advantage  of  the  discreteness  of  the  schooling  variable  and  the  large  census  sample  to  compare 
QR  estimates  with  the  true  (sample)  CQF  evaluated  at  each  point  in  the  support  of  X.  In 
addition  to  the  dots  plotting  QT(Y\X)  against  X,  the  figure  also  shows  the  (solid)  QR  regression 
line. 

To  compare  the  consequences  of  combined  importance-  and  histogram-weighting,  as  in  The- 
orem 1,  to  a  weighting  scheme  using  the  X  histogram  only,  the  figure  also  shows  a  graphical 
representation  of  a  minimum  distance  (MD)  estimator  suggested  by  Chamberlain  (1994).  The 
MD  estimator  is  the  sample  analog  of  the  vector  (3(t)  solving 

P(t)  =  arg  min   E  [(QT(Y\X)  -  X' ' (3)2]  =  arg   min   E  [A* (*,/?)]  .  (10) 

PeRd  /3eRd 

In  other  words,  (3{t)  is  the  slope  of  the  linear  regression  of  QT(Y\X)  on  X,  weighted  only  by 
the  probability  distribution  of  X,  p(x).  The  dashed  line  in  the  figure  has  the  slope  determined 
by  Chamberlain's  estimator.    Note  that  unlike  QR,  the  MD  estimator  relies  on  the  ability  to 


nonparametrically  estimate  QT(Y\X)  in  a  nonparametric  first  step.  This  is  facilitated  here 
by  the  discreteness  of  X  and  our  large  census  samples,  but  would  otherwise  require  additional 
restrictions  and  regularity  conditions.  Chamberlain  (1994)  observes  that,  in  general,  the  MD 
estimator  is  likely  to  be  attractive  only  when  X  is  low  dimensional  and  the  sample  size  is  large. 

For  every  quantile,  the  QR  and  MD  regression  lines  are  remarkably  close,  supporting  the 
conclusion  reached  in  Theorem  1  -  that  QR  is  a  weighted  MD  approximation  to  the  unknown 
CQF  -  and  suggesting  the  extra  weighting  by  the  importance  weights  wT(x,/3)  does  not  induce 
big  differences  between  MD  and  QR.  In  fact,  for  some  quantiles,  the  MD  and  QR  lines  are 
not  discernible  different.  Under  either  weighting  scheme,  the  linear  fits  appear  to  describe  the 
actual  conditional  quantiles  reasonably  well. 

By  way  of  comparison  and  to  provide  a  visual  standard  for  the  goodness  of  fit  of  QR  to  the 
CQF,  the  figure  also  incorporates  a  panel  illustrating  the  fit  of  an  OLS  regression  line  to  the 
CEF.  This  panel  (bottom,  right  position  in  the  figure)  shows  points  on  the  CEF  plotted  as 
dots,  along  with  the  dashed  OLS  regression  line  and  the  solid  generalized  least  squares  (GLS) 
regression  line.  To  compute  the  GLS  slope,  J5[V |X]  was  regressed  on  X,  weighted  by  the  inverse 
of  the  conditional  variance  of  Y  given  X.  The  OLS  fit  to  the  CEF  is  similar  to  the  QR  fit  to 
the  CQF  at  the  median.  The  estimated  median  QR  and  OLS  regression  slopes  are  also  similar, 
at  6.39  and  6.98  in  percentage  terms.  Panel  A  of  Table  1  reports  the  slopes  of  the  lines  plotted 
in  each  panel  of  Figure  1. 

To  further  investigate  the  nature  of  the  QR  weighting  function  in  the  schooling  example, 
Figure  2  plots  wT(x,  (3{t))p{x)  against  the  regressor  X.  The  solid  line  in  the  figure  shows  the 
product  wt(x,P(t))p(x),  along  with  the  histogram  of  education,  p(x),  the  weights  used  in  the 
Chamberlain  MD  estimator.  The  figure  also  shows  normalized  kernel  density  estimates  of  the 
importance  weights,  wt(x,P(t)),  plotted  with  a  dashed  line.2  Consistent  with  the  comparison 
of  estimators  in  Figure  1,  the  importance  weights  are  reasonably  flat  for  the  quantiles  considered 
here,  so  that  most  of  the  variation  in  the  overall  weighting  function  comes  from  the  X  histogram. 
As  in  Figure  1,  Figure  2  again  includes  an  analogous  panel  for  mean  regression  and  the  CEF.  The 
CEF  analog  of  the  QR  importance  weights  is  the  inverse  of  V[Y\  X],  since  the  latter  plays  the 
role  of  importance- weighting  in  GLS  estimation.  Here  too,  the  importance  weighting  function 
is  reasonably  flat. 


2See  Appendix  B  for  a  detailed  description  of  the  procedure  used  for  kernel  density  estimation  of  the  weights. 


2.3      Conditional  Density  as  Primary  Determinant  of  Importance  Weights 

What  features  of  the  joint  distribution  of  Y  and  X  determine  the  theoretical  shape  of  the 
importance  weighting  function  for  QR?  Suppose  initially  that  the  linear  model  for  conditional 
quantiles  is  correct,  so  the  approximation  error  is  zero  and  AT(X, /3(r))  =  0.  In  this  case,  the 
weighting  function  when  evaluated  at  /?  =  /?(r)  simplifies  to 

wt(X,P(t))  =  1/2  •  fy  (QT(Y\X)\X)  ,  (11) 

i.e.,  the  weights  are  proportional  to  the  conditional  density  of  the  response  variable  at  the 
relevant  conditional  quantile.  More  generally,  for  response  data  with  a  smooth  conditional 
density  around  the  relevant  quantile,  we  have  for  (3  in  the  neighborhood  of  /3(r): 

wT(X,(3)  =  1/2- fy(Qr{Y\X)\X)+rT(X),    where  \rT(X)\  <  1/6  •  \AT(X,(3)\  \f'\  .         (12) 

Here,  rT(X)  is  a  remainder  term  and  the  density  fy  (y\X)  has  a  first  derivative  in  y  bounded 
a.s.  by  a  constant,  denoted  /'.3  This  argument  demonstrates  that  we  can  in  most  cases  think  of 
the  density  weights  1/2  •  fy  (QT(Y\X)\X)  as  being  the  primary  determinant  of  the  importance 
weights.4  This  interpretation  applies  when  the  degree  of  misspecification  is  modest  or  the 
variability  of  conditional  density  fy  (y\X)  in  y  near  the  true  CQF  is  not  substantial. 

For  the  empirical  example  considered  in  this  section,  the  weighting  function  wt(X,/3(t)) 
and  the  density-based  approximation  are  remarkably  close.  This  can  be  seen  in  Figure  3,  which 
plots  estimates  of  both  importance  and  density  weights  constructed  using  a  kernel  method.  The 
previous  argument  suggests  there  are  two  reasons  for  this:  the  approximation  error  AT(X,  P(t)) 
is  mostly  small  and  the  conditional  density  fy  {y\X)  does  not  vary  much  in  y  near  the  true 
quantiles.  The  figure  also  shows  that  both  the  weighting  function  and  its  first  order  approxima- 
tion are  fairly  stable,  suggesting  that  the  conditional  density  of  y  is  stable  across  the  levels  of 
X  at  each  quantile. 


3The  remainder  term  is  \rT  (X)  |  =    wT{X,(3)  -  \  ■  f,T{Q\X)    =   /^(l  -  u){f,T(u  ■  AT(X,0)\X)  -  flT(0\X))du   < 
|AT(A-,/3)|-  /'   ■J0l(l-u)-u-du=l-\AT(X,l3)\-  f  . 

Powell  (1994,  p.   2473)  notes  that  an  efficient  weighted  QR  estimator  (in  the  sense  of  attaining  the  relevant 

semiparametric  efficiency  bound)  is  obtained  by  weighting  the  original  Koenker  and  Bassett  QR  minimand  by 
/£x  (0|A").  Since  the  variance  of  the  sample  analog  of  QT(y|A')  is  proportional  to  l//£2r  (0|A"),  Powell's  estimator 
is  equivalent  to  a  GLS  (efficient)  estimator  for  conditional  quantiles  under  correct  specification.  The  first  order 
asymptotic  equivalence  of  the  GLS  fit  and  Powell's  estimator  under  correct  specification  is  noted  by  Knight  (2002). 


2.4     Partial  Quantile  Correlation 

The  least  squares  interpretation  of  QR  has  a  practical  payoff  in  that  we  can  use  it  to  develop  a 
regression-decomposition  scheme  and  an  omitted  variables  bias  formula  for  QR.  The  idea  here 
is  to  express  each  QR  coefficient  as  a  coefficient  in  a  bivariate  LS  projection  of  the  unknown 
CQF  on  each  regressor,  after  the  effects  of  other  regressors  have  been  "partialled  out."  Since 
these  derivations  rely  on  least-squares  algebra,  a  pre-requisite  for  the  development  of  this  de- 
composition is  a  version  of  the  LS  approximation  property  with  weights  that  are  fixed  in  the 
optimization  problem.    This  version  of  the  QR  minimand  is  given  in  the  theorem  below. 

Theorem  2  (Iterative  Approximation  Property)  Under  the  conditions  of  Theorem  1,  QR 
coefficients  satisfy  the  equation 

(3(t)  =  arg  min  E  [wT(X)  ■  A2T(X,  /?)]  ,  (P2) 


where 


wT(X)    =     If   feT(u-Ar(X,0(r))\X)  du 


(13) 


i  r1 

=     -J    fY{u-X'P(r)  +  (l-u)QT(Y\X)\X)   du.  (14) 

Theorem  2  differs  from  Theorem  1  in  that  the  weights  are  defined  ex  post,  i.e.,  they  are  defined 
using  the  solution  vector  to  the  QR  problem.  Theorem  2  complements  Theorem  1  in  that  it 
characterizes  the  QR  coefficient  as  a  fixed  point  to  an  iterated  minimum  distance  approximation.5 
The  relationship  between  the  weighting  functions  in  Theorems  1  and  2  is  analogous  to  the 
relationship  between  the  weights  used  to  compute  a  continuously  updated  GMM  Estimator  and 
the  corresponding  iterated  estimator  (see  Hansen,  Heaton,  and  Yaron,  1996). 

The  weighting  function  wT(X)  is  again  related  to  the  conditional  density  of  the  dependent 
variable.  In  particular,  for  a  response  variable  with  smooth  conditional  density  around  the 
relevant  quantile,  we  have 

wT(X)  =  1/2  •  fY  (QT{Y\X)\X)  +  rT(X),    where  \rT(X)\  <  1/4  •  |Ar(X,/?(r))|  |/'|  ,        (15) 


5In  other  words,  given  weights  defined  in  terms  of  /3(t),  the  solution  to  the  weighted  minimum  distance 
approximation  is  /3(r).  It  is  easy  to  show  that  this  fixed  point  property  defines  /3(t)  uniquely  whenever  /3(t)  is 
the  the  unique  solution  to  the  original  QR  problem. 


where  rT(X)  is  a  remainder  term  and  the  density  fy  (y\X)  has  a  first  derivative  in  y  bounded 
a.s.  by  a  constant,  denoted  /'.6  When  either  AT(X,P(r))  or  /'  is  small, 

wT(X)  kwt(X,0(t))  »  l-fY{QT(Y\X)\X),  (16) 

so  the  approximate  weighting  function  is  the  same  as  before  when  the  QR  coefficient  vector  is 
evaluated  at  its  solution  value. 

Partial  quantile  correlation  is  defined  with  regard  to  a  partition  of  the  regression  vector  into 
a  variable,  X\,  and  the  remaining  d—\  variables  X2,  along  with  the  corresponding  partition  of 
the  QR  coefficients.    Thus, 

X  =  [Xi,X'2]f,       (3(T)  =  (P1(r),p2(r),y.  (17) 

We  can  now  decompose  QT(Y\X)  and  X\  using  orthogonal  projections  onto  X2  weighted  by 
wT(X),  just  as  can  be  done  for  weighted  least  squares  mean  regression: 

QT(Y\X)     =     X2nQ  +  qT(Y\X),  such  that  E[wT(X)  ■  X2  ■  qT(Y\X)}  =  0,  (18) 

Xi    =    X'^x  +  Vlt  such  that  E[wT(X)  ■  X2  ■  Vi]  =  0.  (19) 

In  this  decomposition,  qT(Y\X)  and  V\  are  residuals  created  by  a  weighted  linear  projection 
of  the  CQF,  QT(Y\X),  and  X\  on  X2,  respectively,  using  wT(X)  as  weight.7  Then,  standard 
mathematics  for  least  squares  gives 


0i(t)  =  argminS  [wT(X)  (qT(Y\X)  -  Vrfi)2]  ,  (20) 

ft 


and  also 


A(r)  =  argminS  [wT(X)  (QT(Y\X)  -  V^f)  .  (21) 

pi 

This  shows  that  (3\(t)  can  be  interpreted  as  the  "partial  quantile  correlation  coefficient"  in 
the  sense  that  it  can  be  obtained  from  a  regression  of  the  CQF,  QT(Y\X),  on  X\,  once  we  have 
partialled  out  the  effect  of  X2.  Both  the  partialling-out  and  second-step  regressions  are  weighted 
by  the  QR  weighting  function. 

Figure  4  shows  partial  quantile  correlation  plots  for  the  effect  of  schooling  on  wages,  adjusting 
for  the  effect  of  a  quadratic  function  of  potential  experience.8  For  this  example  the  sample  age 


6The  remainder  term  is  \rT(X)\  =   wT{X)  -  \  ■  /£t(0|X) 

i-|Ar(Jf,/3(T))|-   /'   ■^u-du^l-\AT(X,P(r))\-   f  . 

7Thus,  ttq  =  E  [wT{X)X2X'2\-1  E  [wT(X)X2Qr{Y\X)\  and  m=E  [wT{X)X2X'2]-1  E  [wT(X)X2X1] 
8Potential  experience  is  defined  in  the  standard  way  as  age  -  years  of  schooling  -  6. 


range  is  extended  to  30-54  to  increase  the  range  of  variation  of  potential  experience,  and  the 
sample  is  restricted  to  white  men.9  The  points  in  the  figure  correspond  to  the  scatterplot  of 
the  partial  residuals  of  the  CQF  of  log-earnings  and  schooling  for  the  0.10,  0.25,  0.50,  0.75  and 
0.90  quantiles,  i.e.  qT(Y\X)  plotted  against  Vi;  while  the  solid  line  represents  the  partial  QR 
slope.  In  this  example,  the  partial  CQF  of  log-earnings  given  schooling  looks  to  be  close  to 
linear  for  every  quantile.  The  dashed  line  is  a  counterfactual  QR  with  the  same  slope  as  for 
schooling  without  controls.  As  for  conventional  least  squares  estimates  (see  bottom  right  panel), 
the  omission  of  experience  causes  downward  bias  in  the  coefficient  of  schooling  for  every  quantile, 
since  experience  and  schooling  are  negatively  correlated  and  experience  raises  wages. 

2.5      Omitted  Variables  Bias 

The  previous  discussion  suggests  we  can  use  a  reasoning  process  much  like  that  for  OLS  when 
analyzing  omitted  variables  bias  in  the  context  of  QR.  Here  we  use  the  least  squares  interpre- 
tation of  QR  to  construct  formal  relationship  between  "long"  QR  coefficients  and  "short"  QR 
coefficients.  In  particular,  suppose  we  are  interested  in  a  quantile  regression  with  explanatory 
variables  X  —  [X[,  X2]',  but  X2  is  not  available,  e.g.  ability  in  the  wage  equation.  We  run  QR 
on  X\  only,  obtaining  the  coefficient  vector 

7l(T)  =  argminE[pr(y-^7l)].  (22) 

71 

The  long  regression  coefficient  vectors  are  (/3i(t),/?2(t)),  defined  by 

OW^r)')'  =  arg  pin  E[pT(Y  -  Xjfl  -  X'2fo)].  (23) 

P1,P2 

Finally,  it  is  useful  to  define  a  remainder  term 

Rt(X)  =  Qt(Y\X)-X'1P1(t),  (24) 

equal  to  the  residual  of  the  CQF,  given  both  X\  and  X2,  not  explained  by  the  linear  function 
of  Xi  in  the  long  QR.  If  the  CQF  is  linear,  then  Rr(X)  =  X'2f32{r). 

The  following  theorem  describes  the  relationship  between  71  (r)  and  (3\{t). 

Theorem  3  (Long  and  Short  Coefficients)  Suppose  that  the  conditions  of  Theorem  1  hold 
and  7i(r)  is  uniquely  defined  by  (22).   Then,  1. 

7i(r)  =  argmmE[w;(X)-A2T(X:11)},  (25) 


9The  inclusion  of  black  men  complicates  estimation  of  the  weights  and  CQF  because  of  small  cells. 
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where  Ar(X,7i)  =  X[n  -  QT(Y\X),  eT  =  Y  -  QT(Y\X),  and 

w*r{X)  =  \J  ftr(u-AT(XMr))\X)du.  (2G) 

2.  IfE[w*(X)  ■  XiX[]  is  invertible,  71  (r)  =  j3\{t)  +  £i(r),  where 

Bx{r)  =  E[w*T(X)  ■  XlX[]-lE[w*T{X)  ■  X1Rr(X)}.  (27) 

As  with  OLS  short  and  long  calculations,  the  omitted  variables  formula  in  this  case  shows  the 
short  QR  coefficients  to  be  equal  to  the  corresponding  long  QR  coefficients  plus  the  coefficients 
in  a  weighted  projection  of  omitted  effects  on  included  variables.  While  the  parallel  with  OLS 
seems  clear,  there  are  two  complications  in  the  QR  case.  First,  the  effect  of  omitted  variables 
appears  through  the  remainder  term,  Rr(X).  In  practice,  it  seems  reasonable  to  think  of  this 
as  being  approximated  by  the  omitted  linear  part,  X'^fiiij}-  Second,  the  regression  of  omitted 
variables  on  included  variables  is  weighted  by  w*(X),  while  for  OLS  it  is  unweighted.10 

3     Large  Sample  Properties  of  QR  and  Robust  Inference  Under 
Misspeciflcation 

In  this  section,  we  study  the  consequences  of  misspeciflcation  for  large  sample  inference  on  the 
quantile  regression  process 

1    n 
$(r)  =  arg  min  -  V  pT(Yz  -  Xi/3),  r  e  T  =   closed  subinterval  of  (0, 1).  (28) 

/3eRd  n  f-f 
2=1 

The  QR  process  /?(•),  viewed  as  a  function  of  the  probability  index  r,  is  a  regression  general- 
ization of  the  quantile  processes  and  quantile-quantile  plots  used  in  univariate  and  two-sample 
treatment  control  problems,  cf.  Doksum  (1974).  To  see  this,  suppose  the  regressor  is  a  dummy 
for  receiving  a  treatment,  denoted  D,  so  we  have  X  =  (1,D)'.  Then,  the  components  of  the 
quantile  regression  process  $(•)  —  (/?i(»),  $2{*))'  measure  easily  interpreted  quantities.  In  par- 
ticular, the  intercept  /3i(»)  measures  the  quantile  function  in  the  control  group,  and  the  slope 
/%(•)  measures  the  quantile  treatment  effect  (as  a  function  of  the  probability  r).  When  the 
regressors  are  continuous,  $2{9)  measures  the  quantile  treatment  effect  as  a  response  to  a  unit 


Note  that  the  omitted  variables  bias  formula  derived  here  can  be  used  to  determined  the  bias  from  measure- 
ment error  in  regressors,  by  identifiying  the  error  as  the  omitted  variable.  For  example,  classical  measurement 
error  is  likely  to  generate  an  attenuation  bias  in  QR  as  well  as  OLS  estimates.  We  thank  Arthur  Lewbel  for 
pointing  this  out. 
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change  in  the  treatment.  Under  misspecification,  the  QR  slope  process,  $2(*),  should  be  inter- 
preted as  approximating  the  quantile  treatment  effect,  while  (3\(»)  approximates  the  quantile 
function  in  the  control  group,  in  the  sense  stated  in  Theorem  1. 

Previous  studies  of  the  QR  process  /?(•)  focused  on  the  linear  location  or  scale  shift  mod- 
els, or  Pitman  deviations  from  these  models.  See  especially  Koenker  and  Machado  (1999), 
Gutenbrunner  and  Jureckova  (1992),  Gutenbrunner,  Jureckova,  Koenker,  Portnoy  (1993).  The 
first  purpose  of  this  section  is  to  extend  previous  limit  theory  for  the  QR  process  to  allow  for 
misspecification  of  any  type.  The  second  purpose  is  to  analyze  the  consequences  of  misspecifica- 
tion for  currently  used  inference  tools,  and  derive  inference  procedures  that  remain  valid  under 
misspecification. 

3.1      Basic  Large  Sample  Properties 

The  following  conditions  are  used  to  insure  consistency: 
A.l   (Yi,X{,i  <  n)  are  iid  on  the  probability  space  (f2,  T,  P)  for  each  n. 
A. 2  The  conditional  density  fy(y\X  =  x)  exists  P-a.s. 
A. 3  E  \\X\\  <  oo,  and  for  all  t  £  T,  (5(t)  defined  to  solve 

E  [(r  -  1{Y  <  X'0(t)})X]  =  0  (29) 

is  the  unique  solution  in  M.d. 
Theorem  4  (Consistency  of  QR  Process)    Under  conditions  A.l- A. 3, 

sup||/3(r)-/3(r)||=0p(l).  (30) 

The  following  additional  conditions  are  imposed  to  obtain  asymptotic  normality: 

A. 4  The  conditional  density  fy(y\X  —  x)  is  bounded  and  uniformly  continuous  in  y,  uniformly 
in  x  over  the  support  of  (Y,X). 

A.5  J{t)  =  E  Jy{X'P{t)\X)XX'  is  positive  definite  and  finite  for  all  r,  and  E  \\X\\2+e  <  oo 
for  some  e  >  0. 
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Theorem  5  (Gaussianity  of  QR  Process)    Under  A.1-A.5,  we  have  that 

J(.)v^I  (/3(.)  -  /?(•))  =  ~  JZ  (•  "  1{*  <  Xtf{.)})  Xt  +  0p(l) 

converges  weakly  to  a  tight  zero  mean  Gaussian  process  z{»),  in  the  space  of  bounded  function 
£°°(T),  where  z(«)  is  defined  by  its  covariance  function  T,(t,t')  =  E  {z{t)z{t')'} ,  where 

S(r,r')  =E[{t-1{Y<  X'(3(t)})  (r'  -  1  {Y  <  X'p(r')})  XX'}.  (31) 

When  the  model  is  correctly  specified,  i.e.  QT(Y\X)  =  X'(3{t),  then 

£(r,  t')  =  E0(r,  r1)  =  [min(r,  r')  -  tt'}  ■  E  [XX'}.  (32) 

In  general,  £(-,  •)  ^  So(-,  •)• 

The  proof  of  this  result  (in  the  appendix)  is  of  independent  interest,  since  it  does  not  rely  on 
either  convexity  arguments,  which  are  not  applicable  for  the  process  case,  or  explicit  chaining 
arguments,  which  are  case-specific  and  therefore  difficult  to  establish  for  all  QR  problems  (see 
e.g.  Portnoy,  1991).  In  contrast,  the  proof  relies  primarily  on  the  fact  that  the  functional  class 
{1{Y  <  X'P},(3  £  Rd}  is  Donsker.  Thus,  the  theorem  easily  extends  to  a  wide  range  of  cases 
where  a  uniform  central  limit  theorem  holds  for  this  functional  class.  In  particular,  extensions 
to  strong,  uniformly  mixing,  and  various  Markovian  data  are  immediate. 

Theorem  5  allows  for  misspecification  and  imposes  little  structure  on  the  underlying  con- 
ditional quantile  function  QT(Y\X).  For  example,  smoothness  of  QT(Y\X)  in  X,  which  is 
needed  to  pursue  the  fully  nonparametric  estimation  approach,  is  not  needed.  Theorem  5 
also  has  important  consequences  for  general  inference  on  the  QR  process,  since  it  implies  that 
(EII^'Efr.T')  is  not  proportional  to  the  covariance  function  of  the  standard  d-dimensional 
Brownian  bridge  [min(r,  r')  —  tt'}  ■  I,  unlike  in  the  correctly  specified  case,  where 

(EXX'^EoiT,  t')  =  [min(r,  r')  -  tt'}  ■  J,  (33) 

which  in  turn  implies  that  the  conventional  inference  methods  developed  in  Koenker  and  Machado 
(1999)  do  not  apply  under  misspecification.  Moreover,  the  problem  of  a  nonstandard  covariance 
function  cannot  be  alleviated  by  the  Khmaladzation  techniques  implemented  in  Koenker  and 
Xiao  (2002).  We  therefore  rely  on  Theorem  5  to  develop  general  inference  methods  on  the  QR 
process  that  are  robust  to  misspecification. 

An  important  though  previously  known  corollary  of  Theorem  5  is  that  the  conventional 
standard  errors  used  for  basic  pointwise  inference  are  not  robust  to  misspecification.  This  follows 
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from  the  fact  that  the  covariance  kernel  T,(t,t')  generally  differs  from  T,q(t,t').  In  particular, 
we  have: 

Corollary  1  (Finite-Dimensional  Limit  Theory)  Under  A.1-A.5,  for  a  finite  collection 
Tfe  6  T,  k  =  1, ...,  K,  the  regression  quantile  statistics  y/n($(Tk)—(3(Tk))  are  asymptotically  jointly 
normal,  with  asymptotic  variance  given  by  J(Tfc)_1E(r/c,Tfc)J(rfc)_1  and  asymptotic  covariance 
between  the  k-th  and  l-th  subsets  equal  to  J(t/c)_1S(t/!c,t;)J(t/)~1.  Under  correct  specification 
£(•,  •)  is  replaced  with  Eo(-,  ■)  in  these  expressions. 

Chamberlain  (1994)  and  Hahn  (1997)  give  this  result  for  a  single  quantile,  that  is  for  a  given 
r  eT,  y/n{P(r)  -  (3{t))  -^  TV"  (0,V(r)  =  J-1(t)Z(t,t)J-1(t))  .u  Under  correct  specification 
the  variance  formula  simplifies  to  Vo{t)  =  J_1(r)r(l  —  t)E  \XX'}J~1{t).  Hence  commonly  re- 
ported estimates  of  Vq{t)  are  inconsistent  for  V(t)  under  misspecification  except  for  the  median, 
i.e.  r  =  0.5.  (In  this  case,  the  two  formulae  coincide  because  [r  —  1{Y  <  X' /3(r)}]2  =  1/4  = 
r(l  —  r)  for  r  =  0.5).  Also,  since  the  difference  between  Vq(t)  and  V(r)  is 

(1  -  2r)  •  J(t)-1  ■  E  ((1{Y  <  X'P(t)  -  1{Y  <  QT(Y\X)})  ■  XX')  ■  J(t)~\  (34) 

we  have  that,  for  the  same  degree  of  misspecification,  the  difference  grows  as  we  move  away 
from  the  median  and  it  can  be  positive  or  negative  depending  on  the  sign  of  specification  error 
and  its  correlation  with  the  elements  of  XX' .  For  example,  if  X  is  one-dimensional  and  Y 
is  positive,  then  for  r  <  1/2  the  difference  between  V(r)  and  Vq(t)  will  be  positive  if  the 
corresponding  conditional  quantile  is  lower  than  the  linear  approximation  for  higher  absolute 
values  of  the  regressor,  and  negative  otherwise,  i.e.  if  the  conditional  quantile  is  above  the  linear 
approximation  for  these  values. 

Table  1  illustrates  these  basic  implications  by  reporting  estimates  of  the  schooling  coefficients 
and  their  asymptotic  standard  errors,  using  the  two  alternative  formulae  Vq(t)  and  V(r),  for  the 
empirical  example  considered  in  the  previous  section.  Panel  A  reports  QR  and  OLS  coefficients 
from  regressions  of  log-earnings  on  schooling  for  the  1980,  1990  and  2000  census  samples,  while 
Panel  B  presents  the  same  schooling  coefficients  from  a  model  that  also  controls  for  race  and  a 
quadratic  function  of  potential  experience.  The  standard  errors  were  estimated  using  equations 
(44)- (46),  below.  The  alternative  estimates  of  the  standard  errors  are  fairly  close,  with  the 
biggest  differences  for  tail  quantiles  (0.10  and  0.90).  Here,  the  commonly  reported  standard 
error  is  biased  downwards  since,  for  the  high  levels  of  schooling  where  misspecification  is  more 
severe,  the  conditional  quantile  is  below  the  linear  approximation  for  the  0.10  quantile,  while  it 
is  above  the  linear  approximation  for  the  0.90  quantile. 


"See  also  Kim  and  White  (2002). 
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3.2      Simultaneous  (Uniform)  Inference 

An  alternative  to  pointwise  inference  is  Kolmogorov-type  uniform  inference  on  the  QR  pro- 
cess. Uniform  inference  provides  a  parsimonious  strategy  for  the  study  of  changes  in  an  entire 
response  distribution.  Here  we  derive  robust  uniform  confidence  regions  that  allow  us  to  si- 
multaneously test,  in  the  Scheffe  sense,  a  variety  of  potentially  multi-faceted  hypotheses  about 
conditional  distributions  without  compromising  significance  levels.  Examples  include  specifica- 
tion tests  (omission  of  variables),  stochastic  dominance,  constant  treatment  effects,  and  changes 
in  distribution. 

Of  course,  a  finite  number  of  quantile  regression  coefficients  are  always  estimated  in  prac- 
tice. Nevertheless,  it  is  still  convenient  to  treat  the  quantile-specific  estimates  as  realizations 
of  a  stochastic  process  rather  than  as  a  large  vector  of  parameters.  To  see  this,  consider  the 
construction  of  joint  confidence  intervals  for,  say,  d  =  2  of  the  coefficients  from  a  quantile  regres- 
sion, estimated  at  K  =  20  different  quantiles  (i.e.,  increments  of  .05).  The  number  of  variance 
and  covariance  terms  to  be  estimated  is  dK(dK  +  l)/2  =  820.  The  functional  limit  result  in 
Theorem  5  allow  us  to  avoid  this  high-dimensional  estimation  problem.12  This  approach  also 
leads  to  a  convenient  graphical  inference  procedure,  illustrated  below. 

The  simplest  use  of  the  quantile  regression  process  is  to  test  linear  hypotheses  of  the  form: 

H0  :   R{t)'(3{t)  =  r(r)  for  all  reT.  (35) 

For  example,  we  might  want  to  test  whether  the  coefficient  corresponding  to  the  variable  j  is 
zero  over  the  whole  quantile  process,  i.e.  whether 

/5,-(t)  =  0  for  all  reT.  (36) 

This  corresponds  to  R(r)  =  [0, ...,  1,  ...0]'  with  1  in  the  j  —  th  position  and  r(r)  =  0.  Similarly, 
we  may  want  to  construct  uniform  or  simultaneous  confidence  intervals  for  parameters  or  for 
linear  functions  of  parameters  of  the  form 

R{t)'(3(t)  -  t{t)      for  all  t  e  T.  (37) 

The  following  corollaries  facilitate  both  hypotheses  testing  and  the  construction  of  confidence 
intervals  in  this  framework: 


12Formally,  this  is  because  the  empirical  quantile  regression  process  •v/n(/3(»)  —  /?(•))  asymptotically  behaves 
continuously,  so  that  •v/n(/3(«)  —/?(•))  is  approximately  equivalent  to  a  large  finite  collection  of  regression  quantiles 
\fn{p{jk)  —  0(Tk)),  fc  =  1, ...,  K,  for  a  suitably  fine  grid  of  quantile  indices  Tk  =  {ric,  fc  =  1, ...,  K}  C  T. 
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Corollary  2  (Kolmogorov  Statistic)    Under  the  conditions  of  Theorem  5,  and  (35),  for  any 
V{t)  —  V{t)  +  op(l)  uniformly  in  r  6  T , 

1-1/2 


Kn  =  sup 


R(t)'V(t)R(t)]  '    ~  sM  (R(t)'P(t)  -  r(r) 


K,  (38) 


where  \x\  denotes  the  sup  norm  of  a  vector,  i.e.    \x\  —  maxj  \xA,  and  K,  is  a.  random  variable 
with  an  absolutely  continuous  distribution,  defined  as 

K  =  sup  I  [R{t)'V{t)R(t)]  "1/2  R{t)'J{t)-1z{t)\  .  (39) 

Corollary  3  (General  Uniform  Inference)    Then,  for  n(a)  denoting  the  a- quantile  ofK,  and 
k(a)  any  consistent  estimate  of  it, 

lim  p{y/n(R(T)'f3(T)  -  r{r))  E  fn{r),  for  all  r  €  t\  =  1  -  a,  (40) 

where 

&(t)=  [u(r):    \[R{T)V{T)R{T)rl/2^{R{r)'Kr)-r(r)-u{T))\<k{l-a)  ].        (41) 

For  example,  when  R(t)'(3(t)  —  r(r)  is  scalar,  we  have 

£(r)  =  [  R(t)'0(t)  -  r(r)  ±  *(i-«^Wl^   1  .  (42) 


The  critical  values  k(a)  can  easily  be  obtained  by  subsampling  in  cross-sectional  applications  of 
the  sort  considered  here.  Let  I\,...,Ig  be  B  randomly  chosen  subsamples  of  (Yl,Xi,i  <  n)  of 
size  b,  where  b  — >  oo,  6/n  — >  0,  5  ^  oo  as  n  - >  oo.  First,  compute  the  test  statistic  for  each 
subsample 

KIjtb  =  sup  I  [JR(r)'F(r)JR(r)l  "!/2  >/&JR(r)'  (%b(r)  -  P(t))  I,  (43) 

where  /?/  ,(,(t)  is  the  QR  estimate  using  subsample  Ij.  Then,  define  «(«)  as  the  a-quantile  of 
the  subsampling  sequence  {Kjltb, ...,  KjB ^}.  If  recomputation  of  quantiles  is  not  desirable,  one 
can  replace  Vo(/?/ ^(r)  —  /5(r))  by  its  first  order  approximation,  which  is  a  re-centered  one-step 
estimator:  AIj>b(r)  =  -J(r)"1^  £ie7.(r  ~  Wi  <  *W)*i- 

Corollary  4  (Consistent  /?(a))    The  estimator  k(q) ,  described  above,  is  consistent  for  k{q)  . 

As  noted  above,  in  practice  we  replace  the  continuum  of  quantile  indices  T  by  a  finite- 
grid  Tft-n,  where  the  distance  between  adjacent  grid  points  goes  to  zero  as  n  — ->  oo.   Since  the 
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inference  processes  considered  arc  stochasl  icallv  e<|iiicon(  iniions,  I  his  replacement  does  nol  affect, 
the  asymptotic  theory. 

To  make  previous  inference  methods  operational,  we  also  need  uniformly  consistent  estima- 
tors for  the  components  of  the  variance  formulae: 

Z(t,t')    =     ±£(T-l{Yi<XL$(r)W-l{Yi<Xi$(T')})-XiXl,  (44) 

n  * — ' 
i=l 

E0(r,r')     =     [min(r,r')-rr']--VlI4  (45) 

i=i 

J(r)    =     i^rYil{\Yi-X[P{r)\<K}-XiXl  (46) 

ZTiriji 

i=i 

where  hn  is  such  that  hn   — >  0  and  h\n  — >  oo.13     The  next  result  establishes  the  uniform 
consistency  of  these  estimators. 

Corollary  5   The  estimators  shown  in  equations  (44)-(46)  are  uniformly  consistent  in  (r,r')  if 
E\\X\\4  <oo. 

Figure  5  illustrates  uniform  inference  in  our  empirical  example.  The  figure  shows  robust 
pointwise  and  uniform  95%  confidence  intervals  for  the  schooling  coefficient  /?(•)  from  quantile 
regressions  of  log-earnings  on  schooling,  race  and  a  quadratic  function  of  experience,  using  data 
from  the  1980,  1990  and  2000  censuses.  The  horizontal  lines  indicate  the  corresponding  OLS 
estimates.  The  uniform  bands  were  obtained  by  subsampling  using  200  repetitions  (B  —  200) 
with  subsample  size  b  =  5n2/5,  and  a  grid  of  quantiles  Txn  =  {.1,  .15, ...,  .9}.14 

The  figure  suggests  the  returns  to  schooling  were  low  and  essentially  flat  across  quantiles 
in  1980,  (except  for  r  >  .85,  where  they  shift  up),  a  finding  similar  to  Buchinsky's  (1994) 
using  Current  Population  Surveys  (CPS)  for  this  period.  On  the  other  hand,  the  returns 
increased  sharply  and  became  more  heterogeneous  in  1990  and  especially  in  2000,  a  result  we 
also  confirmed  in  the  CPS.  Since  the  uniform  confidence  bands  do  not  contain  a  horizontal  line, 
we  can  reject  the  hypothesis  of  homogeneous  returns  to  schooling  for  1990  and  2000.  Moreover, 
the  uniform  band  for  1990  does  not  overlap  with  the  1980  band,  suggesting  a  marked  and 
statistically  significant  change  in  the  relationship  between  schooling  and  the  conditional  wage 
distribution  in  this  period.15  A  variety  of  other  hypotheses  regarding  the  returns  to  schooling 
can  similarly  be  tested  using  Figure  5.  Note  also  that  the  uniform  bands  are  not  much  wider  than 


"Following  Koenker  (1994),  we  use  Hall  and  Sheather's  (1988)  rule  setting  hn  =  c  •  n~1/3. 

14Chernozhukov  (2002)  discusses  subsampling  for  QR  inference  in  greater  detail. 

15Using  Bonferoni  bounds,  our  graphical  test  that  looks  for  overlap  in  two  95%  confidence  bands  has  a  sig- 
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the  corresponding  pointwise  bands  due  to  the  high  correlation  between  individual  coefficients  in 
the  QR  process. 

4     Estimates  of  Changing  Residual  Inequality 

One  of  the  most  significant  and  widely-studied  developments  in  the  American  economy  in  the 
last  three  decades  is  the  changing  wage  structure.  The  broad  pattern  has  been  one  of  increasing 
inequality,  as  measured  by  either  the  variance  or  the  gap  between  upper  and  lower  quantiles  of 
the  wage  distribution.  For  example,  Katz  and  Autor  (1999)  note  that  the  90-10  ratio  (i.e.  the 
ratio  of  the  .9  and  .1  quantiles)  increased  by  25  percent  from  1979  to  1995.  Wage  inequality 
appears  to  have  continued  to  increase  since  1995,  though  the  recent  inequality  trend  is  less 
clear-cut  due  in  part  to  changes  in  the  way  US  wage  data  are  collected  (Lemieux,  2003). 

The  increase  in  wage  inequality  is  typically  described  as  arising  in  two  ways:  increasing  wage 
differentials  associated  with  observed  worker  characteristics  such  as  education  and  experience, 
and  increased  dispersion  conditional  on  these  characteristics.  The  first,  known  as  "between- 
group  inequality,"  has  increased  as  a  consequence  of  changes  in  the  distribution  of  characteristics, 
and  especially  changes  in  the  economic  returns  to  these  characteristics.  For  example,  increases 
in  the  economic  return  to  schooling  have  been  an  important  factor  working  to  increase  overall 
wage  dispersion.  The  second,  known  as  within-group  or  "residual  inequality,"  is  -  by  definition 
-  not  directly  linked  to  changes  in  the  distribution  of  covariates  or  their  returns,  though  increases 
in  residual  inequality  are  sometimes  said  to  reflect  increasing  returns  to  "unobserved  skills"  (as 
in  Juhn,  Murphy,  and  Pierce,  1993). 

An  appealing  feature  of  quantile  regression  as  a  tool  for  understanding  wage  inequality  is  that 
QR  coefficients  can  easily  be  used  to  construct  a  measure  of  within-group  or  residual  inequality. 
To  see  this,  note  that  if  we  approximate  QT(Y\X)  by  X' (3{t),  with  log  wages  as  the  dependent 
variable,  then  the  within-group  r  to  r'  ratio  is  provided  by  X'[(3(t)  —  0(t')].  This  fact  highlights 
a  key  difference  between  quantile  regression  and  mean  regression:  a  ceteris  paribus  increase  in 
an  OLS  regression  coefficient  increases  a  variance-based  measure  of  between-group  inequality, 
without  changing  within-group  inequality  as  measured  by  the  residual  variance.  In  contrast, 
a  ceteris  paribus  increase  in  any  non-central  quantile,  r,  increases  within-group  inequality  as 
measured  by  the  spread  from  the  r  to  1  —  r  quantiles. 


nificance  level  of  approximately  10%  (1  —  .952).  A  test  with  exactly  5%  size  can  be  obtained  by  constructing 
confidence  bands  for  the  difference  in  estimated  quantiles  across  years,  again  using  the  procedure  outlined  in 
section  3.2. 
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4.1  The  QR  Summary  Picture 

Our  goal  in  this  brief  empirical  section  is  to  use  linear  QR  to  measure  changing  residual  in- 
equality in  the  1980,  1990,  and  2000  censuses.  To  mitigate  the  impact  of  changes  in  labor  force 
participation,  we  continue  to  focus  on  a  prime-age  sample  consisting  of  US-born  white  and  black 
men  aged  40-49. 

Figure  6  provides  a  compact  QR-generated  summary  of  the  evolution  of  residual  inequality 
from  1980  through  2000.  The  figure  plots  the  averaged  (across  covariates)  conditional  quantiles 
of  earnings,  as  predicted  from  a  QR  model  controlling  for  schooling,  race,  and  a  quadratic 
function  of  potential  experience.  The  leftmost  panel  shows  the  unconditional  quantiles,  i.e.,  the 
marginal  earnings  distribution;  the  middle  panel  conditions  on  covariate  means  for  each  year; 
the  third  panel  fixes  the  covariate  means  at  their  1980  values.16  Each  panel  shows  quantiles  for 
the  three  census  years,  plotted  using  a  line-width  determined  by  the  uniform  inference  bands  for 
fitted  values  derived  from  our  QR  estimates  of  the  quantile  process.  To  facilitate  a  comparison 
of  inequality  while  holding  location  fixed,  the  line  for  each  year  is  centered  at  median  earnings 
for  that  year. 

The  largest  shift  in  unconditional  distributions  occurred  between  1980  and  1990,  primarily 
in  the  lower  half  of  the  earnings  distribution.  This  shift  is  statistically  significant,  as  can  be 
seen  from  the  fact  that  the  bands  for  these  two  years  do  not  overlap.  A  comparison  of  Panels  B 
and  C  with  Panel  A  shows  the  residual  distribution  shifting  more  smoothly  than  the  marginal 
distribution.  This  is  because  conditioning  smooths  out  some  of  the  heaping  commonly  found  in 
survey-based  earnings  data.  Panel  B  shows  a  clear  increase  in  residual  inequality  from  1980  to 
1990,  with  a  continuing  increase  from  1990  to  2000.  An  interesting  feature  of  the  latter  increase, 
however,  is  that  it  appears  to  have  occurred  only  in  the  upper  half  of  the  wage  distribution. 
Below  the  median,  the  conditional  quantiles  for  1990  and  2000  overlap.  Panel  C  shows  a  similar 
pattern  when  the  covariate  distribution  is  held  fixed.  Autor,  Katz,  and  Kearney  (2004)  report 
a  similar  asymmetry  in  their  analysis  of  CPS  data,  with  virtually  all  inequality  growth  in  the 
1990s  in  the  upper  half  of  the  wage  distribution. 

4.2  Accuracy  of  the  QR  Picture 

While  Figure  6  provides  a  useful  distillation  of  the  QR  results,  we  are  especially  interested  in 
whether  the  linear  QR  model  accurately  captures  key  features  of  changing  residual  inequality  in 
this  period,  both  overall  and  for  specific  groups.    The  large  census  data  sets  allow  us  to  compare 


16  Panel  C  uses  a  slightly  different  schooling  recode  to  maximize  comparability;  see  the  appendix  for  details. 
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QR  estimates  with  the  corresponding  non-parametric  estimates  of  the  CQF.  Paralleling  the 
analysis  of  1980  census  data  in  the  previous  section,  we  begin  our  analysis  of  changing  residual 
inequality  by  assessing  the  quality  of  the  QR  fit  to  the  CQF  for  1990  and  2000  census  data. 
Figures  7  and  8  show  the  QR  fit  to  the  CQF  in  both  census  data  sets,  for  a  model  where  the  sole 
regressor  is  years  of  schooling.  As  for  1980,  the  fit  is  reasonably  good  at  all  quantiles,  though 
somewhat  worse  at  the  .75  and  .9  quantiles  than  lower  down,  especially  for  2000.  Again,  the 
corresponding  QR  coefficient  estimates  are  reported  in  Panel  A  of  Table  1. 

The  figures  also  compare  the  QR  regression  line  to  the  Chamberlain  MD  line,  obtained  from 
a  histogram-weighted  fit  of  the  linear  model  to  the  CQF.  Again,  as  for  1980,  the  MD  and  QR 
lines  are  almost  indistinguishable,  suggesting  the  importance  weights  are  flat  and/or  the  true 
CQF  is  not  too  far  from  linear.  More  evidence  on  the  nature  of  the  weighting  function  can  be 
seen  in  Figures  9  and  10,  which  plot  importance  weights  and  histogram  weights,  and  Figures  11 
and  12,  which  plot  the  importance  weights  and  density  weights.  These  figures  establish  that 
the  conditional  density  of  Y  given  X,  and  hence  the  QR  importance  weights,  are  indeed  fairly 
flat  at  all  quantiles  and  in  both  years. 

To  assess  the  performance  of  QR  as  a  tool  for  measuring  residual  inequality,  Table  2  reports 
alternative  inter-quantile  spreads  constructed  from  the  CQF  and  QR.  Panel  A  reports  estimates 
for  the  whole  sample,  averaged  using  the  sample  distribution  of  the  covariates.  This  panel  shows 
an  important  overall  increase  in  wage  inequality,  which  cannot  be  totally  explained  by  changes 
in  the  distribution  of  and  returns  to  the  covariates.  The  QR  90-10  spread  tracks  the  CQF  90-10 
spread  remarkably  well;  the  latter  runs  from  1.20  to  1.43,  while  QR  implies  a  90-10  spread 
ranging  from  1.19  to  1.45  in  the  model  that  controls  for  schooling,  race  and  experience.  Results 
are  equally  good  for  the  inter-quartile  range  and  the  two-half-spreads,  and  for  the  model  that 
only  controls  for  schooling.  The  asymmetry  of  residual  inequality  growth  since  1990  can  be 
seen  by  comparing  the  change  in  the  90-50  and  50-10  spreads. 

The  evolution  of  residual  inequality  for  specific  schooling  groups  provides  a  more  stringent 
test  of  the  QR  approach.  Panels  B  and  C  of  Table  2  report  results  from  a  model  that  includes 
schooling  with  and  without  potential  experience  and  race,  evaluated  for  specific  schooling  groups. 
The  90-10  spread  based  on  the  CQF  for  high  school  graduates  (12  years  of  schooling)  moves  from 
1.09  in  1980  to  1.26  in  1990  to  1.29  in  2000,  when  race  and  experience  are  included.  QR  fitted 
values  similarly  show  an  increase  from  1.17  in  1980  to  1.31  in  1990  and  1.32  in  2000.  Thus,  like 
the  CQF  for  high  school  graduates,  QR  shows  an  increase  in  residual  inequality  of  around  .14 
in  the  first  decade,  with  essentially  no  change  in  the  second.     The  results  are  similar  without 
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controlling  for  race  and  experience.  The  evolution  of  the  inter-quartile  range  appears  to  have 
been  broadly  similar  to  that  of  the  90-10  spread  for  the  high  school  group. 

While  residual  inequality  grew  little  for  high  school  graduates  in  the  1990s,  college  graduates 
(16  years  of  schooling)  saw  a  substantial  increase  in  wage  dispersion.  This  echoes  Autor,  Katz, 
and  Kearney's  (2004)  comparison  of  college  and  high  school  graduates  using  the  CPS.  Again, 
QR  captures  the  essential  features  of  this  pattern  remarkably  well.  The  90-10  spread  estimated 
from  the  CQF  for  college  graduates  increased  from  1.26  to  1.44  in  the  1980s  and  then  to  1.55  in 
the  1990s.  The  corresponding  QR  estimates  imply  an  increase  from  1.19  to  1.38  in  the  1980s, 
and  then  to  1.57  in  the  1990's.  The  QR  estimates  also  capture  about  two-thirds  of  the  growth 
in  residual  inequality  over  the  entire  period  for  the  other  spreads  considered.  The  ability  of 
QR  to  track  these  changes  seems  especially  impressive  given  the  changes  (detailed  in  the  data 
appendix)  in  the  underlying  schooling  variable  across  censuses. 

4.3       QR-based  measures  of  inequality 

As  with  variance-based  measures  of  dispersion,  we  can  use  quantile  spreads  and  their  QR  approx- 
imations to  provide  convenient  summary  measures  of  residual  inequality.  A  natural  measure  al- 
ready discussed  is  the  inter-quantile  range:  IQRry[Y\X]  ^X'/3{t)-X'P{t')  =X'[P(t)-P(t')}, 
where  r  is  some  high  index,  for  example  90%,  and  r'  is  some  low  index,  for  example  10%.  A 
summary  or  typical  measure  of  residual  inequality  is  the  median  IQR, 

RIr>r/  =  Med{lQRry[Y\X]}  **  Med{X'(3(r)  -X'/3(r')}  .  (47) 

On  the  other  hand,  a  reasonable  measure  of  between-group  inequality  can  be  given  by  the 
inter-quantile  range  of  the  conditional  median: 

BIr,T,  =  IQ^T,{Med[Y\X}}  £  IQRr^iX' 0(1/2)},  (48) 

which  measures  the  variation  in  the  central  location  of  the  conditional  distribution. 

To  grade  the  relative  importance  of  within  and  between-group  inequality,  we  can  define  the 
following  "residual-to-totai"  ratio  and  its  QR  approximation: 

RTR,.     =     [MedjIQRrMX]}]* 

[MedilQRr^lYlX]}]2  +  [iQ^iMed^X]}]2 


[Med{X'f3(T)-X'f3{T')}\ 


[Med{X'(3(T)  -  X'P(t')}}2  +  [lQRTy{X't3{\/2)}\ 
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(50) 


The  RTR  measure  can  be  motivated  by  an  analogy  to  traditional  analysis  of  variance  models, 
where  the  ratio  of  residual  to  total  variance  is  "1  —  R2" ,  i.e. 

E{V[Y\X}} 
E{V[Y\X}}  +  V{E[Y\X}}-  {     ' 

RTR  replaces  standard  deviation  with  inter-quantile  range  as  a  measure  of  dispersion  and  means 
with  medians  as  a  measure  of  location.  In  fact,  in  the  classical  normal  location-shift  model, 
Y  =  X' j3  +  U,  the  two  measures  coincide  (this  is  the  reason  why  the  squares  are  present  in  the 
definition  of  RTR),  but  they  would  be  different  in  general.17  RTR  is  nonnegative  by  construction 
and  satisfies  the  natural  restrictions 

0  <  RTRry  <  1.  (52) 

RTR  necessarily  equals  1  if  Y  is  independent  of  X  (no  between-group  inequality)  and  equals  0 
when  conditional  dispersion  is  zero  (no  within-group  inequality). 

Table  3  compares  ANOVA  and  quantile- based  estimates  of  between-group  inequality,  within- 
group  inequality,  and  the  relative  importance  of  within-group  inequality  for  both  a  non-parametric 
and  a  linear  model  of  log-earnings  that  includes  schooling,  race  and  potential  experience  as  co- 
variates.  QR  and  CQ-based  measures  are  generally  closer  for  within-group  inequality  than  for 
between-group  inequality.  Both  QR  and  CQ-based  measures  suggest  a  sharp  increase  in  within- 
group  and  between-group  inequality,  especially  in  the  upper  tail.  For  example,  RIgo.so  and 
BIgo,50  grew  much  faster  than  RI.50,10  and  BIso,io-  On  the  other  hand,  there  is  no  clear  trend 
in  the  relative  importance  of  within-group  inequality.  For  example,  the  QR-based  RTRgo^o 
go  from  80%  to  81%  between  1980  and  1990,  and  then  back  to  78%  in  2000.  Some  of  these 
general  trends  are  also  captured  by  the  standard  ANOVA-based  measures,  but  the  latter  does 
not  capture  the  asymmetric  changes  in  the  upper  and  lower  tails. 

5      Summary  and  conclusions 

We  have  shown  how  linear  quantile  regression  provides  a  weighted  least  squares  approximation 
to  an  unknown  and  potentially  nonlinear  conditional  quantile  function,  much  as  OLS  provides 
a  least  squares  approximation  to  a  nonlinear  CEF.  The  QR  approximation  property  leads  to 
partial  quantile  plots  and  an  omitted  variables  bias  formula,  analogous  to  standard  specification 
tools  for  OLS. 


1  An  alternative  choice  for  the  denominator  is  the  marginal  interquantile  range  QT(Y)  —  QT>(Y).  However,  this 
leads  to  a  relative  measure  that  can  exceed  1. 
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A  natural  question  raised  by  the  relationships  explored  here  is  the  sensitivity  of  QR  to 
changes  in  sample  design.  Unlike  a  postulated-as-true  linear  model,  the  nature  of  the  QR 
approximation  changes  in  stratified  samples.  Of  course,  an  OLS  regression  line  has  this  fc ■; 1 1  m< 
as  well.  Like  the  OLS  approximation  to  a  nonlinear  CEF,  the  nature  of  the  weights  underlying 
the  QR  approximation  to  a  nonlinear  CQF  change  as  the  histogram  of  X  changes  (though 
not  otherwise,  since  the  importance  weights  are  a  function  of  X).  The  role  played  by  the  QR 
weighting  scheme  seems  like  an  empirical,  application-specific  question.  Jn  practice,  it  may  be 
of  interest  to  use  stratification  weights  to  improve  the  linear  QR  fit  for  subpopulations  of  special 
interest.  This  is  a  topic  we  plan  to  explore  in  future  work. 

While  misspecification  of  the  CQF  functional  form  does  not  affect  the  usefulness  of  QR,  it 
does  have  implications  for  inference.  We  have  presented  a  misspecification-robust  distribution 
theory  for  the  QR  process.  This  provides  a  foundation  for  uniform  confidence  intervals  and  a 
basis  for  global  tests  of  hypotheses  about  distribution.  The  interpretation  of  such  tests  is  more 
subtle,  however,  when  the  assumption  of  correct  specification  is  dropped.  The  results  of  a  global 
test  may  change  as  the  nature  of  the  QR  approximation  changes. 

Finally,  we  used  the  tools  here  to  describe  the  wage  distribution  in  three  censuses,  proposing 
summary  measures  of  between  and  within-group  inequality.  For  the  most  part,  linear  QR  cap- 
tures the  evolution  of  the  conditional  wage  distribution  remarkably  well.  Of  particular  interest 
is  the  finding  that  the  growth  of  within-group  inequality  between  1990  and  2000  is  largely  due  to 
an  expansion  of  the  upper  half  of  the  conditional  wage  distribution  and  the  growing  inequality 
in  the  wage  distribution  of  college  graduates.  Traditional  regression-based  inequality  measures 
miss  these  developments. 
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A      Appendix:  Proofs 


A.l     Proof  of  Theorem  1. 

We  have  that 


0(t)  =  aigmmE[pT(Y  -  X'0)\. 


(53) 


Then,  we  can  subtract  E[pT(Y  —  QT(Y\X))},  without  affecting  the  optimization,  because  it  does  not 
depend  on  0  and  is  finite  by  condition  (ii): 


Write 


(3(t)  =  arg  min  {E[pT(Y  -  X'0))  -  E[pT(Y  -  QT(Y\X))}}  . 

0€Rd 


E[pT(er-AT(X,0))}-E[pT(eT)] 

=  E[(t-  l{eT  <  AT(X,0)})  (eT  -  AT{X,0))]  -E[(t-  l{eT  <  0})er] 
=  E  [(l{eT  <  AT(X,  /?)}  -  r)  AT(X,  (3)}  -  E  [(l{eT  <  AT(X,/3)}  -  l{eT  <  0})  eT] 


(54) 


(55) 


Now,  write 


I  =  E  [(l{eT  <  AT(X:P)}  -  t)At(X,P)} 
l=]  E  [E  [(l{eT  <  AT(X,(3)}  -  t)\X]  AT(X,f3)} 
=  E[[FeT(AT(X,P)\X)-F£T(Q\X)]AT(X,P)} 


(6) 


(c) 


E    (J   f£AuAT(X,P)\X)AT(X,P)duJAT(X,P) 
fCr(uAT(X,/3)\X)du)  A2T(X,p) 


(56) 


where  (a)  is  by  the  law  of  iterated  expectations,  (b)  is  by  condition  (i)  (a.s.    existence  of  conditional 
density),  and  (c)  is  by  linearity  of  the  integral.  Similarly, 


II  =  E  [l{eT  G  [0,  AT(X,p)}}  •  \eT\]+E  [l{eT  6  [AT(X,/3),0]}  ■  \eT\ 
=  E[l{uTe[O,l]}-Ur-\AT{X,0)\) 


where 


uT  =eT/AT{X,P)     if  AT(X,/3)^0, 
uT  =  l  if  AT{X,0)  =0. 

Next,  note  that  for  the  case  AT(X,@)  ^  0 

fUT(u\X)  ■  du  =  feT(uAT(X,j3)\X)  ■  \AT(X,j3)\  ■  du,  so 


£[1K€[0,1]K|X]-|AT(X,/?)| 


ufUT(u\X)du 


|A.(X,/3)| 


/    uflr(uAT{X,P)\X)du    -A;{X, 

Jo 


/?)• 


(57) 
(58) 

(59) 

(60) 
(61) 
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For  cases  when  AT(X,/3)  =  0 

E[l{uT  €  [0,  l\}uT\X}  ■  \AT(X,P)\  =  0. 
Thus,  it  follows  that 

n  =  E[l{uTe[O,l}}uT\AT(X,l3)\)=E[E[l{uT<=[O,l}}uT\X}\AT(X,0)\} 


f  uftT(uAT(X,0)\X)du 
Jo 


A2T(X,P) 


A. 2      Proof  of  Theorem  2. 

We  have  to  prove  that  /3(r)  that  solves 

/3(T)  =  aigmmE[pT(Y-X'0)}, 

0eRd 

is  equal  to  P*  (t)  that  solves 

13* {t)  =  arg  min  £  [wT(X)  ■  A2T(X,p)}  . 
0eud 

The  FOC  for  program  (P2)  is  given  by 

2  E  [wT(X)  At(X,P*{t))  X]  =  0, 

where 

wT{X)     =     l-JfiT{u-AT{X,p{r))\X)du. 

The  FOC  for  program  (PI)  is  given  by 

I'=E  [(l{eT  <  At(X,P(t))}  -t)  X]  =  0, 

which  by  calculations  similar  to  those  in  (56)  can  be  written  as 

/'  (=>  E[E[(l{eT  <  At(X,(3(t))}  -  t)  \X]-X] 
=  E  [(Fer  (AT(X,  p(r))\X)  -  Fir  (0\X))  •  X] 

(b) 
(c) 


feT(uAT(X,0(r))\X)AT(X,P(T))du\  ■  X 
J   feT(uAT(X,P(T))\X)du)  AT(X,p(r))  ■  X 


(62) 


(63) 


(PI) 


(P2) 


(64) 


(65) 


(66) 


(67) 


where  (a)  is  by  the  law  of  iterated  expectations,  (b)  is  by  a.s.  existence  of  conditional  density,  and  (c)  by 
linearity  of  the  integral.  By  the  definition  of  wT(X) 


I'  =  2  E  [wT{X)  At(X,(3{t))  X]  =  0. 


(68) 


Finally,  note  that  this  is  precisely  the  FOC  for  program  (P2). 

Both  program  (Pi)  and  program  (P2)  are  convex.  (PI)  has  unique  solution  /3(t)  by  assumption,  which 
means  it  uniquely  solves  the  FOC.  Hence  since  the  (PI)  and  (P2)  have  the  same  first  order  condition,  it 
follows  that  P*(t)  =  P(t)  uniquely  solves  the  FOC  for  both  programs.  H. 
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A. 3     Proof  of  Theorem  3. 

Taking  claim  1  as  given,  claim  2  is  immediate: 

7l(r)     =     E[w*T(X)X1X[}-1E  [w;(X)Xl(X'1pl(T)  +  RT(X))}  (69) 

=     Pl(r)  +  E  [wT{X)XlX[\-lE  [w*T{Xx)  X.RriX)} .  (70) 

It  remains  to  prove  claim  1.    The  first  order  condition  of  the  quantile  regression  of  Y  on  X\  in  the 
population  is  given  by 

E  {{\{Y<X[1i{t)}-t)X1\=Q,  (71) 

or  for  eT  =  Y  -  QT(Y\X)  and  AT(A,7l(r))  =  A|7i(t)  -  QT{Y\X), 

E  [(l{eT  <  AT(X,7l(r))}  -  r)  Xl]  =  0.  (72) 

This  can  be  rewritten  as 

E  [E[l{eT  <  AT(X,7l(r))}  -  l{eT  <  0}\XX]  Xx]  =  0,  (73) 

since  P{eT  <  0\XX}  =  E[P{eT  <  0|Xi,X2}  \XX]  =  E[r\Xi\  =  r.  Write 

E[l{eT  <  At(X,7i(t))}  -  l{eT  <  0}\X1]  C=J  E[E[l{eT  <  AT(X,7l(r))}  -  l{eT  <  0}\X]\Xi] 


E[[F^(AT(Xni(r))\X)  -  F,r(0|X)]|Xi] 

'I   /£t(UAt(A,7i(t))|X)  AT(A\7l(T))dU)    |x 

Y    /£T(UAr(A,7l(r))|A)rf^    AT(A,7l(r))|x! 


(74) 

where  (a)  is  by  the  law  of  iterated  expectations,  (b)  is  by  a.s.  existence  of  conditional  density,  and  (c) 
by  linearity  of  the  integral.  Defining  w*(X)  =  \  L  f£T(uAT(X, 7i(r))|X)c?u,  we  can  rewrite  the  previous 
first  order  condition  as 

2  E  [E  [w*T(X)  At(X,7i(t))|X1]   •  X1]  =  0,  (75) 

or,  by  the  law  of  iterated  expectations 

2E  [w*(X)  AT(A,7l(r))    ■  X,}  =  0.  (76) 

Finally,  note  that  this  is  precisely  the  first  order  condition  for  the  program 

7l(r)  =  arg   min   E[w*(X)  ■  {X'xll  -  QT(Y\X))2}M  (77) 

A. 4     Notation  for  Proofs  of  Theorems  4  and  5 

We  use  the  following  empirical  processes  in  the  sequel,  for  W  =  (Y,X) 

f^En  [f(W)\  =  -T  fW),    f~Gn  [f(W)\  =  -1=  £(/W)  -  E  \f{Wi))).  (78) 

i=i  v      i=i 
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If  /is  an  estimated  function,  G71  [/(W)l  denotes  ^  E"=i(/W)  -  E  lf(wi)\)f=f-  Other  basic  nota 
tion  and  stochastic  convergence  concepts,  such  as  weak  convergence  in  the  space  of  bounded  functions, 
stochastic  equicontinuity,  Donsker  classes,  and  Vapnik-Cervonenkis  (VC)  classes,  are  used  and  defined  ai 

in  van  der  Vaart  (1998). 

A. 5     Proof  of  Theorem  4 

Observe  that,  for  each  r  in  T,  /?(t)  minimizes 

Qn{r,P)  =  En  [pT(Y  -  X'P)  -  pT(Y  -  X'p{r))}  .  (79) 

Define 

Q00(r,/3)  =  E  [pT(Y  -  X'p)  -  pT(Y  -  X'p(r))} .  (80) 

By  A. 2  and  A. 3,  Qoo(t,P)  is  uniquely  minimized  at  /3(r)  for  each  r  in  T. 

Since  by  Knight's  identity  pT(u  —  v)  —  pT{u)  =  — (r  —  l{u  <  0})v  +  Jq\1{u  <  s]  —  l{u  <  0}}ds,  we 
have,  by  setting  u  =  Y  -  X'P(r)  and  v  =  X'(P  -  P(t)),  that 

pT(Y  -  X'p)  -  pT(Y  -  X'p(r))  =  -(r-l{Y<  X'0(t)})X'(/3  -  P(r)) 

rX'(0-0(r))  (81) 

+  /  [1{Y  <  X'P(t)  +s}-  1{Y  <  X'P(T)}]ds. 

Jo 

Thus,  it  follows  that  for  any  p  G  Rd 

\Qx(t,P)\  <  2  •  E\X'(P  -  p(r))\  <  2  •  £;||^||  •  ||/3  -  P(t)\\  <  co.  (82) 

We  can  also  show  that  for  any  compact  set  B 

Q„(t,  P)  =  Q00(t,  P)  +  op.  (1),  uniformly  in  (t,  P)eT  x  B.  (83) 

This  statement  is  true  pointwise  by  the  Khinchin  LLN.  The  uniform  convergence  follows  because 

\Qn(r',P')  -  Qn(r",P")\  <  d  •  \t'  -  t"\  +  C2  ■  \\P'  -  P"\\,  (84) 

where 

d  =  2  ■  E\\X\\  ■  sup  \\p\\  <  cxd  and  C2  =  2  •  E\\X\\  <  oo.  (85) 

0€B 

Hence  the  empirical  process  (r, /?)  i— >  Qn(r,P)  is  stochastically  equicontinuous,  which  implies  the  uniform 
convergence. 

Consider  a  collection  of  closed  balls  Bm(P{t))  of  radius  M  and  center  /?(t),  and  let  Pm{t)  ~  P{T)  + 
&m{t)  -v(t),  where  v(t)  =  (vi(t),  ...,Vd{r))'  is  a  direction  vector  with  unity  norm  ||w(t)||  =  1  and  8m{t) 
is  a  positive  scalar  such  that  5m(t)  >  M.  Then  uniformly  in  t  £  T, 

(Qu(t,0m(t))  -  Qn(r,/3(r)))  >  Qn(r,^,(r))  -  Qn(T,p(T)) 


5m  (r) 


>  Qoo{t,PIAt))  -  Q^Pir))  +  op-(l) 

>  eM  +  op-(l), 
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for  some  cm  >  0,  where  (a)  follows  by  convexity  in  (3,  for  (3*m{t)  on  the  line  connecting  (3m(t)  and  /3(r), 
(b)  follows  by  the  uniform  convergence  established  in  (83),  (c)  follows  by  the  assumption  that  /3(r)  is  the 
unique  minimizer  of  Q<x{0,  r)  uniformly  in  r  6  T.  Hence  for  any  M  >  0,  the  minimizer  /3(r)  must  be 
within  M  from  (3(t)  uniformly  for  all  t  €  T,  with  probability  approaching  to  one.  That  is,  we  have  that 
for  any  M  >  0,  ||/3(t)  —  /3(r)\\  <  M  uniformly  for  all  r  6  T  with  probability  approaching  to  one.  ■ 

A. 6     Proof  of  Theorem  5 

First,  by  the  computational  properties  of  /3(r),  for  all  r  €  T,  cf.  Theorem  3.3  in  Koenker  and  Bassett 
(1978): 

En  \ipT(Y  -  X'0(t))X\  J]  <  const  •  f         ~"  "      "  J  ,  (87) 

where  <pT{u)  =  T  —  l{u  <  0}.  Note  that  £||Xj||2+e  <  oo  implies  supi<n  \\Xi\\  =  op-(n1/2),  since 

P  ('sup  \\Xi\\  >  nl/A  <  nP(\\Xi\\  >  nl/2)  <  nE\\Xif+€/n^  =  o(l).  (88) 

Hence  uniformly  in  r  €  T, 

En  [ipT(Y  -  X%(r))x\  =  op.  (n"1/2)  .  (89) 

Second,  (r,  /3)  i— >  Gn  [</?T  (K  —  X' (3)  X]  is  stochastically  equicontinuous  over  B  x  T,  where  S  is  any 
compact  set,  with  respect  to  the  L2{P)  pseudometric 


p((r',  p),  (r",  /?"))  s  ,maxd  ^  [(^  (7  -  X'/3')  X3  -  Vt„  (Y  -  X'/3")  X,f]  , 


(90) 


for  j  e  1,  ...,d  indexing  the  components  of  the  vector  X.  This  is  because  the  functional  class  T  = 
{1{Y  <  X'(3},P  G  B}  is  a  VC  subgraph  class  and  hence  also  Donsker  class,  with  envelope  2.  Hence 
the  functional  class  T  —  T  is  also  Donsker  with  envelope  equal  2,  by  Theorem  2.10.6  in  Van  der  Vaart 
and  Wellner  (1996).  The  product  of  T  —  T  with  X  also  forms  a  Donsker  class  with  a  square  integrable 
envelope  2  •  maxj£x,..d  l^lj,  by  Theorem  2.10.6  in  Van  der  Vaart  and  Wellner  (1996).  The  stochastic 
equicontinuity  then  is  a  part  of  being  Donsker. 

The  uniform  consistency  supTeT  ||/3(t)  —  P(t)\\  =  op-(l)  implies 


=  V(1).  (91) 


supp((T,6(r)),(r,/?(r))) 

and  therefore  by  stochastic  equicontinuity  of  (r,  /?)  i— >  Gn  [yT  (V  —  X'j3)  X]  we  have  that 

Gn  \ipr(Y  -  X'(3{t))x)  =  Gn  [<pT{Y  -  X'/3(t))X]  +  op-(l),  uniformly  in  r.  (92) 

In  order  to  show  (91)  note  that  for  /  denoting  the  upper  bound  on  fy(y\X  =  x),  application  of  the 
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Holder's  inequality,  a  Taylor  expansion  and  Cauchy-Schwai z  inequality,  give  the  series  "I  ini  qualities: 
suppf(r,6(r)),(r,/3(T))) 


<  sup    max 
T67-j€l,...,d 


iJ  J  (93) 


sup  .  max    ,/£  UipT  (Y  -  X'b(r))  X3  -<pr(Y-  X'0(t))  Xtf] 

<pr(Y-X'b(T))-<pT(Y-X'/3(T))\^ 

<  sup    max    (£  \1{Y  <  X'b{r)}  -  1{Y  <  X' 0(t)}\)*&™  ■  (e  \XA2+')  "* 
T67-iei,...,d  \  / 

<sup(jS|/-X'(6(r)-/3(r))|)^W-(£;||X||2+£)^ 

T6T 

<  const  •  sup  (/ •  {E\\X\\2)1/2  ■  \\b(r)  -  /3(r)||)  ^  , 

where  the  second  inequality  follows  by  binomiality  of  |<^T  (Y  —  X'b(r))  —  ipT  (Y  —  X'/?(r))|.  Then,  eval- 
uating at  6(r)  =  /?(r) 


supp((r,6(r)),(T,/3(r))) 


<  const  •  sup  ||/3(r)  -  0(t)\\*&+V  =op-(1),  (94) 

b(r)=/3(r)  r€T 


by  uniform  convergence  and  e  >  0. 

Third,  by  a  Taylor  expansion,  uniformly  inrsT 

E[pT{Y-X'P)X]\  =E\fY(X'b(T)\X)XX']\  0(t)-P(t)),  (95) 

I,3=/5<t)  I6(t)=/3*(t) 

where  P*(t)  is  on  the  line  connecting  /3(r)  and  /3(r)  for  each  r.  /3(t)  is  uniformly  consistent  by  Theorem 
4,  hence  /3*(r)  is  also  uniformly  consistent.  Thus  by  A5,  i.e.  the  uniform  continuity  and  boundedness  of 
the  mapping  y  i— >  fy(y\x),  uniformly  in  x  over  the  support  of  X,  it  follows  that 

E\fY(X'b(.)\X)XX'}  I  =  £[/y.(X7?(.)|X)XX']+0(l)  in  £~(T).  (96) 

lb(.)=0*(.)  v v > 

J(.) 

Indeed,  by  A5  for  any  compact  K,  E  [fY{X'b{»)\X)XX'l{X  G  K}]  I  =  JS  [/y  (X'/?(.)|X)XX'1{X  €  K}]- 

I6(.)=/3*C») 

o(l).  Then,  fortfc  =  Rd\A\  £  [/yp£T'6(.)|X)J0:,l{X  e  Kc}}  I  and  £  [/y(X'/3(.)|X)XX'l{X  6  Xc}] 

l6(.)=/3*(.) 

can  be  made  arbitrarily  small  in  large  samples.  This  follows  by  setting  the  set  K  sufficiently  large  and 
using  E\\XX'\\  <  co  and  fY(X'f}(»)\X)  <  f  a.s. 
Fourth,  since 

the  left  hand  side  (lhs)  of  (89)  =  lhs  of  (95)+  n"1/2  lhs  of  (92),  (97) 

we  have  by  using  (96) 


•/(•)(/?(•)  "  /?(•))  +  V     sup    /3(r)  -  /3(r)  (98) 

Vrer  II  11/ 

+n-1/2G„[(p.(y  -  X'/3(.))X]  +  op-in-1'2)  in  £°°(T). 
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Since  mineig  (J(t))  >  A  >  0,  uniformly  inrgT 

sup  ||n-1/2G„  [<pT(Y  -  X'/3(r))X]  +  op.  (n-1/2)|| 

=  sup  II  J(t)0(t)  -  (3(t))  +  or  (  sup  ||/3(t)  -  /3(r)||)|  (99) 

T€T  II  r€T  II 

>(A  +  op.(l))-Sup||/3(r)-/3(r)||. 

r6T 

Fifth,  by  the  stated  assumptions,  the  mapping  r  1— >  /3(r)  is  continuous.  In  fact,  it  is  continuously  dif- 
ferentiable,  since  by  the  implicit  function  theorem,  for  /?(r)  defined  as  solution  to  E  \(t  —  1{Y  <  X'0})X]  = 
0,  we  have  that  dP(r)/dT  =  J(t)~1E  [X] .  Hence  n->G„  [ipr  (Y  —  X'(3(t))  X]  is  stochastically  equicon- 
tinuous  over  T  by  continuity  of  the  mapping  r  h-»  (3{t)  for  the  pseudo-metric  given  by  p(r' ,t")  = 
P((t',/3(t')),  (t",/3(t"))).  Then,  stochastic  equicontinuity  of  t  >->  G„  [<Pt{Y  —  X'(3{t))X]  and  ordinary 
CLT  imply  that 

G„  [<p.(Y  -  X'P(»))X]  =>  z(.)  in  e°°(T),  (100) 

where  z(»)  is  a  Gaussian  process  with  covariance  function  S(»,  •)  specified  in  the  statement  of  Theorem 
5.  Therefore,  the  lhs  of  (99)  is  Op(n-1/2),  implying 

sup||^(/3(r)-/?(r))||=Op.(i).  (101) 

Finally,  by  (99)-(101) 

>/«(£(•)  "  /?(•))  =  -^_1(-)Gn  b.(y  -  A"/3(.))]  +  op.  (1)  in  £°°(T) 

(102) 

=4-  J"1(.)-2(.)  in£°°(T).     ■ 

A. 7     Proof  of  Corollaries 

Proof  of  Corollary  1.  The  result  is  immediate  from  the  definition  of  weak  convergence  in  £°°(T).  ■ 
Proof  of  Corollary  2.  The  result  follows  by  the  continuous  mapping  theorem  in  £°°(T).  ■ 
Proof  of  Corollary  3.  The  result  is  immediate  from  Corollary  2.  ■ 

Proof  of  Corollary  4.  The  result  is  immediate  from  Politis,  Romano  and  Wolf  (1999),  Theorem  2.2.1 
and  Corollary  2.4.1,  for  the  case  when  the  rescaling  matrices  are  known.  For  the  case  when  the  matrices 
are  consistently  estimated  the  proof  follows  by  an  argument  similar  to  the  proof  of  Theorem  2.5.1  in 
Politis,  Romano  and  Wolf  (1999).  Finally,  we  also  need  that  K  has  an  absolutely  continuous  distribution. 
This  result  follows  from  Theorem  11.1  in  Davydov,  Lifshits,  and  Smorodina  (1998).  ■ 
Proof  of  Corollary  5.  Note  that  this  corollary  is  not  covered  by  the  results  in  Powell  (1986)  or 
Buchinsky  and  Hahn  (1998)  for  consistency  of  J(t),  because  their  proofs  apply  only  pointwise  in  r, 
whereas  we  require  a  uniform  result. 
First,  recall  that 

J(r)  =  2^-E„  [l{\Y  -  X'J{r)\  <  hn}  ■  XiX(\  .  (103) 

We  will  show  that 

J(t)  -  J(t)  =  Op-(l)  uniformly  in  r  G  T  .  (104) 
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Note  that  2hnJ(r)  =  En  f/,(/5(r), /?,„)]  ,  where  /,(/?,  h)  =  l{|yt-X2'/3|  <  h}-XiX[.  Next,  for  any  compacl 
set  B  and  positive  constant  H ,  the  functional  class  {fi{P,  /i),  /?  S  B,h  £  (0,  //)}  is  a  Donsker  class  with 
square  integrable  envelope  by  Theorem  2.10.6  in  Van  der  Vaart  and  Wellner  (1996),  since  this  is  a  product 
of  a  VC  class  {l{|Vi  —  X[(3\  <  h},  (3  €  B,  he  (0,  H}}  and  a  square  intergrable  random  matrix  XiX- 
(recall  £?||Xj||4  <  oo  by  assumption).  Therefore,  (/?,  h)  >— >  Gn  [fi{(3,  h)\  converges  to  a  Gaussian  process 
in  £°°(B  x  (0,//)),  which  implies  that 

sup        \\En[fi(l3,h)}-E[fi(P,h)]\\=Op.(n-^2).  (105) 

0eB,O<h<H  II  II 

Letting  B  be  any  compact  set  that  covers  UT£tP{t),  this  implies 

sup  ||e„  [ft(fi(T),  hnj\  -  E  [/,(/?,  h)\  \p=kr)ih=hn  I  =  Op-  {n-1'2).  (106) 

Hence  (104)  follows  by  using  that  2hn  J{t)  =  En  [.A(/?(t),  ft„)l  and  noting  that  l/2hn-E  [fi((3,  h)}  \ftr)ih=hn 
J(t)  +  op(l)  by  an  argument  similar  to  that  used  in  (96)  and  the  assumption  h\n  — >  oo. 
Second,  we  can  write 

E(r,r')  =E„  [5i(/3(r)  J(tO,t,t')M/]  ,  (107) 

where  5i(/3',/3",r',r")  =  (r  -  1{K,  <  X[P'}){t'  -  1{YZ  <  X'rf"})  ■  X%X[.  We  will  show  that 

E(r,  r')  -  E(r,  r')  =  op.  (1)  uniformly  in  (r,  r')  6  T  x  T  .  (108) 

Note  that  {&(/?', /?",t',t"),  (/?',/3",t',t")  e  B  x  B  x  T  x  T}  is  Donsker  and  hence  a  Glivenko-Cantelli 
class,  for  any  bounded  set  B.  Indeed,  ^3  =  {l{li  <  X-P},/3  €  B}  is  a  VC  class,  and  hence  is  Donsker. 
Then,  T  —  Tp  is  also  a  bounded  Donsker  class  with  envelope  2,  by  Theorem  2.10.6  in  Van  der  Vaart  and 
Wellner  (1996).  Next,  the  product  of  two  bounded  classes  (T  -  Tp)  x  (T  -  Tp)  is  a  bounded  Donsker 
class  with  envelope  4,  by  Theorem  2.10.6  in  Van  der  Vaart  and  Wellner  (1996).  Last,  the  product  of  a 
bounded  Donsker  class  with  a  square  integrable  random  matrix  XX[  gives  a  Donsker  class,  by  Theorem 
2.10.6  in  Van  der  Vaart  and  Wellner  (1996). 

This  implies  that  uniformly  in  (/3',/3",r',r")  £  {B  x  B  x  T  x  T) 

En  [&(/?', /3",t',t")W]  -E[gi{p',P",T',r")XiXi}  =  op.(l).  (109) 

By  inspection,  E\gi(0',0",T,,T")XiXi]  is  continuous  in  (/?',/?",  t',t")  over  (B  x  BxTxT).  Letting  B 
cover  Ur/3(r),  continuity  and  (109)  imply  (108). 
A  similar  argument  applies  to  £o(r, T')  •" 

B      Appendix:  Estimating  the  QR  Weighting  Function 

We  calculate  the  importance  weights  using  equation  (7).  The  integral  was  estimated  with  a  grid  of  101 
points  between  the  non-parametric  estimates  of  the  CQF  (QT(Y\X))  and  the  QR  approximation  (X'(3(t)), 
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for  each  cell  of  the  covariates  X .  This  gives  rise  to  the  following  discrete  approximation  formula  for  the 
importance  weights 

i2r(«,  Mr))  =  ~  f>  ~  W}  ■  Iy  (W  •  X'kT)  +  (1  "  W}  •  ^(y|X  =  X)|X  =  X)  •      (110) 
We  used  kernel  density  estimates  of /y(y|X  —  x)  with  a  Gaussian  kernel  and  bandwidth  (h)  determined 

by 


'Var[Y  -  QT{Y\X  =  x)\X  =  x], 


IQR0.25,0.7S{Y  -  QT(Y\X  =  x)\X  =  x] 


1.349 


fill) 


0.9 -m  ,„    „, 

This  bandwidth  choice  is  optimal  in  the  sense  that  it  minimizes  mean  integrated  square  error  with 
Gaussian  data  and  a  Gaussian  kernel  (Silverman,  1986).  The  density  weights  were  calculated  similarly. 
Sampling  weights  were  used  in  the  estimation  of  conditional  densities  for  the  2000  census  sample. 

To  calculate  weights  for  partial  quantile  correlation,  wT(X),  we  also  use  a  discrete  approximation  of 
the  average  density  of  the  response  variable  representation.  In  particular,  we  have 

-—.  1      101  1  /     -  1  -  1  \ 

*•"(*)  «  101  £  2  ■  ?Y  (loo-  ■  X'd{T)  +  (1  "  l00_)  '  ®AYlX  =  X)lX  =  V  '  (U3) 

where  the  conditional  densities  are  estimates  using  the  same  kernel  method  as  for  the  importance  weights. 


C     Appendix:  Sampling  Weights 

In  order  to  take  into  account  the  weighted  structure  of  the  census  2000  sample,  the  estimators  for  the 
components  of  the  variance  formulae  in  Table  1  were  modified  as  follows 

£(t,t')     =     -  ^  wf  ■  (t  -  1(YX  <  X'J(t))(t' -  l{Yi  <  X'J(t'))  ■  XiX'i,  (114) 

2  =  1 

±o(t,t)     =     [mm(T,T')-Tr'}--J2w^-XiX'i,  (115) 

i=l 

J(t)     =     ^-Ytwi--l(\Yi-X'J{T)\<hn)'XiX'i.  (116) 

n   i=\ 

where  u>i  are  the  sampling  weights  (normalized  to  add  to  n).  Other  calculations  involving  the  2000  sample 
use  sampling  weights  in  the  standard  way. 

D     Appendix:  Data 

The  data  were  drawn  from  the  1%  self-weighting  1980  and  1990  samples,  and  the  1%  weighted  2000 
sample,  all  from  the  IPUMS  website  (Ruggles  et  al,  2003).    The  sample  for  most  of  the  calculations 
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consists  of  US-born  black  and  white  men  with  age  40-49  with  at  least  5  years  of  education,  with  positive 
annual  earnings  and  hours  worked  in  the  year  preceding  the  census,  and  with  nonzero  sampling  weight. 
Individuals  with  imputed  values  for  age,  education,  earnings  or  weeks  worked  were  also  excluded  from 
the  sample.  After  this  selection  process,  the  final  sample  sizes  were  65,023,  86,785  and  97,397  for  1980, 
1990  and  2000. 

The  log-earnings  variable  is  the  average  log  weekly  wage  and  was  calculated  as  the  log  of  the  reported 
annual  income  from  work  divided  by  weeks  worked  in  the  previous  year.  Annual  income  is  expressed  in 
1989  dollars  using  the  Personal  Consumption  Expenditures  Price  Index. 

The  education  variable  for  1980  corresponds  to  the  highest  grade  of  school  completed,  coded  as 
follows: 


Years  of  schooling 

Highest  grade  of  school  completed 

5 

5th  grade  of  Elementary  School 

6 

6th  grade  of  Elementary  School 

7 

7th  grade  of  Elementary  School 

8 

8th  grade  of  Elementary  School 

9 

9th  grade  of  High  School 

10 

10th  grade  of  High  School 

11 

11th  grade  of  High  School 

12 

12th  grade  of  High  School 

13 

1st  year  of  College 

14 

2nd  year  of  College 

15 

3rd  year  of  College 

16 

4th  year  of  College 

17 

5th  year  of  College 

18 

6th  year  of  College 

19 

7th  year  of  College 

20 

8th  or  more  year  of  College 

For  the  purposes  of  Figure  5  and  most  of  the  empirical  work,  years  of  schooling  for  1990  and  2000  censuses 
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were  imputed  from  categorical  schooling  variables  as  follows: 


Years  of  schooling      Educational  attainment 


8  5th,  6th,  7th,  or  8th  grade 

9  9th  grade 

10  10th  grade 

11  11th  or  12th  grade,  no  diploma 

12  High  school  graduate,  diploma  or  GED 

13  Some  college,  but  no  degree 

14  Completed  associate  degree  in  college,  occupational  program 

15  Completed  associate  degree  in  college,  academic  program 

16  Completed  bachelor's  degree,  not  attending  school 

17  Completed  bachelor's  degree,  but  now  enrolled 

18  Completed  master's  degree 

19  Completed  professional  degree 

20  Completed  doctorate 

For  the  purposes  of  Panel  C  in  Figure  6,  we  modify  this  slightly,  coding  5th-8th  grade  as  8  and  2-3  years 
in  college  as  14  in  1980,  and  coding  the  categories  associate  college  degree,  occupational  program,  and 
associate  degree,  academic  program,  as  14  in  1990.  These  changes  generate  schooling  variables  with  the 
same  range  and  points  of  support  in  all  3  years. 
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Table  1:  Human  capital  earnings  function: 
Estimates  of  schooling  coefficients  and  standard  errors  (%) 


Desc.  Stats. 
Census        Obs.        Mean       SD 


0.1 


Quantile  Regression  Estimates 


0.25 


0.5 


0.75 


0.9 


OLS  Estimate 


Coeff. 


Root 


A.  Without  controls 


1980 


65,023       6.40       0.67 


7.48 

7.13 

6.39 

6.56 

7.42 

(0.223) 

(0.078) 

0.067) 

(0.069) 

(0.100) 

[0.239] 

[0.081] 

0.067] 

[0.070] 

[0.110] 

6.98 
(0.080) 
[0.087] 


0.( 


1990        86,785       6.46       0.69 


10.04  9.57  8.93  9.23  11.59  9.78 

(0.130)  (0.119)  (0.075)  (0.108)  (0.169)  (0.082) 

[0.135]  [0.121]  [0.075]  [0.107]  [0.178]  [0.087] 


o.e 


2000 


97,397       6.50       0.75 


9.80  10.57  11.05  11.89  15.51  11.71 

(0.201)  (0.129)  (0.109)  (0.115)  (0.624)  (0.092) 

[0.208]  [0.133]  [0.109]  [0.115]  [0.669]  [0.113] 


O.f 


B.  Controlling  for  race  and  quadratic  function  of  potential  experience 


1980        65,023       6.40       0.67 


1990        86,785       6.46       0.69 


2000        97,397       6.50       0.75 


7.35 

7.35 

6.83 

7.01 

7.91 

7.20 

(0.190) 

(0.120) 

0.099) 

(0.104) 

(0.145) 

(0.120) 

[0.199] 

[0.123] 

0.099] 

[0.106] 

[0.153] 

[0.127] 

11.15 

10.96 

10.62 

11.08 

13.69 

11.36 

(0.274) 

(0.123) 

0.104) 

(0.149) 

(0.252) 

(0.117) 

[0.285] 

[0.126] 

0.104] 

[0.148] 

[0.263] 

[0.122] 

9.16 

10.49 

11.13 

11.95 

15.73 

11.44 

(0.195) 

(0.120) 

0.126) 

(0.134) 

(0.385) 

(0.117) 

[0.204] 

[0.122] 

0.126] 

[0.134] 

[0.401] 

[0.141] 

O.f 


O.f 


O.f 


Notes:  US-born  white  and  black  men  aged  40-49.  Standard  Errors  in  parentheses.  Standard  Errors  robust  to  mispecifica 
in  brackets.  Sampling  weights  used  for  2000  Census. 


Table  2:  Comparison  of  CQF  and  QR-based  Interquantile  Spreads 


Interquantile  Spread 

90-10 

75-25 

90-50 

50-10 

Census 

Obs. 

Controls 

CQ 

QR 

CQ 

QR 

CQ 

QR 

CQ 

QR 

A. 

Overall 

1980 

65,023 

No 
Yes 

1.20 
1.20 

1.20 
1.19 

0.56 
0.56 

0.56 
0.55 

0.51 
0.52 

0.52 
0.51 

0.69 
0.68 

0.68 
0.67 

1990 

86,785 

No 

1.37 

1.36 

0.65 

0.65 

0.61 

0.61 

0.76 

0.75 

Yes 

1.35 

1.35 

0.64 

0.64 

0.60 

0.61 

0.75 

0.74 

2000 

97,397 

No 

1.45 

1.45 

0.71 

0.70 

0.68 

0.69 

0.77 

0.76 

Yes 

1.43 

1.45 

0.70 

0.68 

0.67 

0.70 

0.76 

0.75 

B.  High  School  Graduates 


1980        25,020 


1990        22,837 


2000        25,963 


No 

1.10 

1.20 

0.51 

0.57 

0.42 

0.51 

0.67 

0.69 

Yes 

1.09 

1.17 

0.52 

0.55 

0.44 

0.50 

0.65 

0.67 

No 

1.27 

1.33 

0.64 

0.66 

0.51 

0.56 

0.76 

0.77 

Yes 

1.26 

1.31 

0.63 

0.64 

0.52 

0.55 

0.74 

0.76 

No 

1.32 

1.34 

0.68 

0.67 

0.60 

0.61 

0.72 

0.73 

Yes 

1.29 

1.32 

0.66 

0.66 

0.59 

0.60 

0.70 

0.72 

C.  College  Graduates 


1980 


1990 


2000 


7,158 


15,517 


19,388 


No 

1.25 

1.19 

0.60 

0.54 

0.58 

0.55 

0.67 

0.64 

Yes 

1.26 

1.19 

0.59 

0.53 

0.61 

0.54 

0.65 

0.64 

No 

1.49 

1.40 

0.68 

0.64 

0.69 

0.67 

0.77 

0.73 

Yes 

1.44 

1.38 

0.66 

0.63 

0.70 

0.66 

0.74 

0.72 

No 

1.57 

1.57 

0.73 

0.72 

0.75 

0.78 

0.82 

0.78 

Yes 

1.55 

1.57 

0.74 

0.71 

0.75 

0.80 

0.80 

0.78 

Notes:  US-born  white  and  black  men  aged  40-49.  Average  measures  calculated  using  the  distribution 
of  the  covariates  in  each  year.  The  covariates  are  schooling  (controls  =  No)  or  schooling,  race  and  a 
quadratic  function  of  experience  (controls  =  Yes).  Sampling  weights  used  for  2000  Census. 


Table  3:  Measures  of  Between-group  (Model)  and  Within-group  (Residual) 
Inequality  and  Linear  (Quantile)  Regression  Approximations 


Quantile-based  Measures 


ANOVA 


Census         Obs. 


90-10  75-25  90-50  50-10  Cond. 

CQ       QR       CQ       QR       CQ       QR       CQ       QR  Mean 


OLS 

Fit 


A.  Between-group  Inequality 

0.60      0.59      0.15      0.23      0.35      0.32  0.25  0.27 

0.63      0.65      0.33      0.35      0.37      0.41  0.27  0.24 

0.66      0.75      0.51      0.43      0.42      0.53  0.24  0.22 

B.  Within-group  Inequality 

1.14      1.17      0.52      0.54      0.49      0.51  0.65  0.66 

1.32      1.35      0.62      0.63      0.57      0.59  0.73  0.75 

1.38      1.41      0.67      0.67      0.64      0.66  0.73  0.75 
C.  Relative  Importance  of  Within-group  Inequality  (RTR  and  1-R  ) 

1980        65,023             78         80         93         85         65         72  87  86                87            88 

1990        86,785             81         81         78         76         71         68  88  90                84            85 

2000        97,397             82         78         63         71         70         61  90  92                84            85 


1980  65,023 

1990  86,785 

2000  97,397 

1980  65,023 

1990  86,785 

2000  97,397 


0.24  0.23 

0.28  0.27 

0.30  0.29 

0.63  0.63 

0.63  0.64 

0.68  0.69 


Notes:  US-born  white  and  black  men  aged  40-49.  Measures  calculated  in  a  model  that  includes 
schooling,  race  and  experience.  Relative  measures  calculated  as  the  square  of  Panel  B  divided 
by  the  sum  of  the  square  of  Panel  A  and  the  square  of  Panel  B.  Sampling  weights  used  for  2000 
census. 


A.  tau  =  0.10 


B.  tau  =  0.25 


Schooling 


Schooling 


Schooling 


o      CQ 

KBQR 

-  -    CQR 


Schooling 


Schooling 


Figure  1:  CQF  and  CEF  in  1980  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A  -  E  plot  the  Conditional  Quantile  Function,  Koenker  and  Basset's  Quantile 
Regression  fit  and  Chamberlain's  Minimum  Distance  fit  for  weekly  log-earnings  given 
years   of  schooling.    Panel   F   plots  the  Conditional   Expectation   Function   (CEF), 
Weighted  LS  fit  and  OLS  fit  for  weekly  log-earnings  given  years  of  schooling. 
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Figure  2:  Weighting  Functions  in  1980  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A-E  plot  the  histogram  of  years  of  schooling,  QR  weighting  function  and  importance 
weighting   function   for   QR's   of  log-earnings   on   years  of  schooling.    Panel  F   plots   the 
histogram   of  years   of  schooling,    WLS    weighting   function  and   inverse  of  the  conditional 
variance  for  the  linear  regression  of  log-earnings  on  years  of  schooling. 
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Figure  3:  Importance  and  Density  Weights  in  1980  Census  (US-born  white  and  black 
men  aged  40-49). 
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Figure  4:  Partial    Quantile    Correlation   Plots  in  1980  Census  (US-born  white  men  aged  30-54). 
Panels  A-E  plot  the  Partial  Conditional  Quantile  Function  and  Partial  QR  fit  of  log-earnings  on 
years  of  schooling,    controlling  for  a  quadratic   function  of  experience.    The  dashed  line  has  the 
same  slope  as  a  QR  line  of  log-earnings  on  years  of  schooling  without  controlling  for  experience. 
Panel  F  plots  the  Partial  Conditional  Expectation  Function  and  Partial  OLS  fit  of  log-earnings 
on  years  of  schooling,    controlling  for  a  quadratic  function  of  experience.    The  dashed  line  has 
the  same  slope  as  a  OLS    line   of  log-earnings   on   years   of  schooling   without  controlling  for 
experience. 
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Figure  7:  CQF  and  CEF  in  1990  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A  -  E  plot  the  Conditional  Quantile  Function,  Koenker  and  Basset's  Quantile 
Regression  fit  and  Chamberlain's  Minimum  Distance  fit  for  weekly  log-earnings  given 
years   of  schooling.    Panel   F   plots  the  Conditional    Expectation   Function    (CEF), 
Weighted  LS  fit  and  OLS  fit  for  weekly  log-earnings  given  years  of  schooling. 
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Figure  8:  CQF  and  CEF  in  2000  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A  -  E  plot  the  Conditional  Quantile  Function,  Koenker  and  Basset's  Quantile 
■Regression  fit  and  Chamberlain's  Minimum  Distance  fit  for  weekly  log-earnings  given 
years   of  schooling.    Panel   F   plots  the  Conditional   Expectation   Function   (CEF), 
Weighted  LS  fit  and  OLS  fit  for  weekly  log-earnings  given  years  of  schooling. 
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Figure  9:  Weighting  Functions  in  1990  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A-E  plot  the  histogram  of  years  of  schooling,  QR  weighting  function  and  importance 
weighting   function   for    QR's   of  log-earnings    on   years    of  schooling.    Panel  F   plots   the 
histogram   of  years   of  schooling,   WLS    weighting   function  and   inverse  of  the  conditional 
variance  for  the  linear  regression  of  log-earnings  on  years  of  schooling. 
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Figure  10:  Weighting  Functions  in  2000  Census  (US-born  white  and  black  men  aged  40-49). 
Panels  A-E  plot  the  histogram  of  years  of  schooling,  QR  weighting  function  and  importance 
weighting   function   for    QR's   of  log-earnings   on   years   of  schooling.    Panel  F    plots   the 
histogram    of  years   of  schooling,    WLS    weighting   function  and   inverse  of  the  conditional 
variance  for  the  linear  regression  of  log-earnings  on  years  of  schooling. 
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Figure  11:  Importance  and  Density  Weights  in  1990  Census  (US-born  white  and  black 
men  aged  40-49). 
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Figure  12:  Importance  and  Density  Weights  in  2000  Census  (US-born  white  and  black 
men  aged  40-49). 
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